Load the data

fullNHANES_recat <- read_csv(here("cleaned_data","fullNHANES_recat.csv"))
## New names:
## Rows: 22349 Columns: 30
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (10): fpl, age, gender, refED, refEDspouse, childED, adultED, ethnicity,... dbl
## (20): ...1, year, WTINT2YR, SDMVPSU, SDMVSTRA, DMDEDUC3, DMDEDUC2, DMDHR...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

The svydesign function

Before we can start our analyses, we need to use the svydesign function from the “survey” package written by Thomas Lumley. The svydesign function tells R about the design elements in the survey. Once this command has been issued, all that needs to be done for the analyses is use the object that contains this information in each command. Because the 2001-2016 NHANES data were released with a sampling weight (wtint2yr), a PSU variable (sdmvpsu) and a strata variable (sdmvstra), we will use these our svydesign function.

nhc <- svydesign(id=~SDMVPSU, weights=~WTINT2YR,strata=~SDMVSTRA, nest=TRUE, survey.lonely.psu = "adjust", data=fullNHANES_recat)
nhc
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)

We can get additional information about the sample, such as the number of PSUs per strata, by using the summary function.

summary(nhc)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 4.278e-06 2.534e-05 4.579e-05 6.387e-05 8.363e-05 7.468e-04 
## Stratum Sizes: 
##             14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
## obs        173 195 184 216 167 226 202 205 179 221 191 177 190 218 157 169 220
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##             31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47
## obs        183 154 184 279 178 177 171 177 159 171 152 183 140 206 177 200 155
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##             48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
## obs        152 236 190 158 177 153 189 175 166 175 129 173 200 244 188 184 205
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##             65  66  67  68  69  70  71  72 73 74  75  76  77  78  79  80  81
## obs        185 185 236 195 151 127 134 146 91 74 229 211 244 216 196 194 213
## design.PSU   2   2   2   2   2   2   2   2  2  2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2  2  2   2   2   2   2   2   2   2
##             82  83  84  85  86  87  88 89  90  91  92  93  94  95  96  97  98
## obs        195 189 178 154 225 147 154 74 240 268 239 152 181 207 171 151 204
## design.PSU   2   2   2   2   3   2   2  2   3   3   3   2   2   2   2   2   2
## actual.PSU   2   2   2   2   3   2   2  2   3   3   3   2   2   2   2   2   2
##             99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
## obs        163 183 208 156  71 194 171 182 184 200 184 171 179 211 182 202 181
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
## obs        198 140 198 144 227 204 197 171 179 210 243 223 221 230 258 261 270
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            133
## obs        167
## design.PSU   2
## actual.PSU   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"

Subpopulation Analysis

Complex survey data are unique. With survey data, you (almost) never get to delete any cases from the data set, even if you will never use them in any of your analyses. Instead, the survey package has two options that allow you to correctly analyze subpopulations of your survey data.

These options are ‘svyby’ and ‘subset.survey.design’.

The subset.survey.design option is sort of like deleting unwanted cases (without really deleting them, of course), and the svyby option is very similar to by-group processing in that the results are shown for each group of the by-variable.

Why deleting cases from a survey data set can be so problematic:

There are two formulas that can used to calculate the standard errors.

One formula is used when you do by-group processing or delete unwanted cases from the dataset, and survey statisticians call this the conditional approach. This is used when members of the subpopulation cannot appear in certain strata and therefore those strata should not be used in the calculation of the standard error. In practice, this rarely happens in public-use complex survey datasets. One reason is because the analyst usually does not know which combination of variables defines a particular stratum.

The other formula is used when you use the svyby option, and survey statisticians call this the unconditional approach. This is used when members of the subpopulation can be in any of the strata, even if there are some strata in the sample data that do not contain any members of the subpopulation.

Because members of the subpopulation, all of the strata need to be used in the calculation of the standard error, and hence all of the data must be in the dataset.

If the data set is subset (meaning that observations not to be included in the subpopulation are deleted from the data set), the standard errors of the estimates cannot be calculated correctly. When the svyby option is used, only the cases defined by the subpopulation are used in the calculation of the estimate, but all cases are used in the calculation of the standard errors.

[For more information on this issue, please see Sampling Techniques, Third Edition by William G. Cochran (1977) and Small Area Estimation by J. N. K. Rao (2003). A nice description of this issue given in Brady West’s 2009 Stata Conference (in Washington, D.C.).]

Both svyby and subset.svy.design use the formula for the unconditional standard errors.

Mean of age

svymean(~RIDAGEYR, nhc)
##            mean     SE
## RIDAGEYR 38.744 0.2599

Mean of mono-ethyl phthalate

(need to tell R to skip the missing values)

svymean(~URXMEP, nhc, na.rm = TRUE)
##          mean     SE
## URXMEP 269.81 9.4418

Mean of age for males and females.

The variable female is the subpopulation variable.

svyby(~RIDAGEYR, ~gender, nhc, svymean)
genderRIDAGEYRse
female39.50.294
male38  0.326

the highest grade level of education completed by participants 6-19 y.o. (DMDEDU3)

Primary = 0:8 Secondary = 9:15

svyby(~DMDEDUC3, ~age, nhc, svymean, na.rm = TRUE)
ageDMDEDUC3se
child6.060.0827
middle-aged0   0     
older adult0   0     
young adult14.3 0.325 

Can use more than one categorical variable to define the subpopulation.

To do so, put + between the variables.

svyby(~RIDAGEYR, ~refED+gender, nhc, svymean)
refEDgenderRIDAGEYRse
college and beyondfemale39.80.53 
partial college and belowfemale39.40.315
college and beyondmale40  0.549
partial college and belowmale37.40.354

Three variables are used.

svyby(~log(monoEthyl), ~refED+citizenship+gender, nhc, na = TRUE, svymean)
refEDcitizenshipgenderlog(monoEthyl)se
college and beyondbirth or naturalizationfemale3.880.0561
partial college and belowbirth or naturalizationfemale4.3 0.0335
college and beyondnot U,S, citizenfemale3.770.146 
partial college and belownot U,S, citizenfemale4.550.0716
college and beyondbirth or naturalizationmale3.890.0452
partial college and belowbirth or naturalizationmale4.250.0314
college and beyondnot U,S, citizenmale3.950.148 
partial college and belownot U,S, citizenmale4.670.0663

Sometimes you don’t want so much output. Rather, you just want the output for a specific group. You can get this by creating a subpopulation of the data with the subset function. In the example below, we obtain the output only for males.

smale <- subset(nhc,gender == "male")
summary(smale)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, gender == "male")
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 4.675e-06 2.594e-05 4.641e-05 6.351e-05 8.457e-05 5.394e-04 
## Stratum Sizes: 
##            14 15 16  17 18  19  20 21  22  23 24 25 26  27 28 29  30 31 32 33
## obs        86 87 92 100 85 110 103 95 101 111 88 89 81 114 79 78 111 89 68 92
## design.PSU  2  2  2   2  2   2   2  2   2   2  2  2  2   2  2  2   2  2  2  2
## actual.PSU  2  2  2   2  2   2   2  2   2   2  2  2  2   2  2  2   2  2  2  2
##             34 35 36 37 38 39 40 41 42 43  44 45 46 47 48  49 50 51 52 53 54 55
## obs        135 84 80 78 70 85 81 74 89 78 106 88 86 74 78 119 98 77 97 70 86 85
## design.PSU   2  2  2  2  2  2  2  2  2  2   2  2  2  2  2   2  2  2  2  2  2  2
## actual.PSU   2  2  2  2  2  2  2  2  2  2   2  2  2  2  2   2  2  2  2  2  2  2
##            56 57 58 59 60  61 62 63  64 65 66  67  68 69 70 71 72 73 74  75  76
## obs        88 93 64 85 99 116 98 98 103 83 87 117 101 78 62 66 74 40 32 112 110
## design.PSU  2  2  2  2  2   2  2  2   2  2  2   2   2  2  2  2  2  2  2   2   2
## actual.PSU  2  2  2  2  2   2  2  2   2  2  2   2   2  2  2  2  2  2  2   2   2
##             77  78 79 80  81  82 83  84 85  86 87 88 89  90  91  92 93 94 95 96
## obs        120 108 97 96 115 103 85 100 79 119 72 76 41 123 133 126 65 92 97 83
## design.PSU   2   2  2  2   2   2  2   2  2   3  2  2  2   3   3   3  2  2  2  2
## actual.PSU   2   2  2  2   2   2  2   2  2   3  2  2  2   3   3   3  2  2  2  2
##            97  98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## obs        75 101 70 100 113  84  33 101  73  85  86  97  88  89  78  95  86
## design.PSU  2   2  2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU  2   2  2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
## obs         97  91  96  64  92  61 121  93 102  75  76 122 121 111 103 111 124
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            131 132 133
## obs        132 147  79
## design.PSU   2   2   2
## actual.PSU   2   2   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"
svymean(~RIDAGEYR,design=smale)
##            mean     SE
## RIDAGEYR 37.967 0.3256

log(monoEthyl) level for children

schild <- subset(nhc,age == "child")
summary(schild)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "child")
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 7.386e-06 3.978e-05 7.794e-05 8.869e-05 1.160e-04 4.555e-04 
## Stratum Sizes: 
##            14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs        64 75 78 76 54 75 96 57 69 72 88 71 81 67 71 52 82 52 56 77 92 59 77
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs        70 67 62 72 65 81 56 58 69 72 50 66 90 56 76 51 69 78 78 69 84 53 41
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs        43 73 49 41 69 67 62 50 53 30 47 48 33 25 29 69 59 62 56 55 50 59 70
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs        50 45 44 62 47 56 15 66 83 79 33 65 40 56 48 67 45  59  42  63  19
## design.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
## actual.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
##            104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs         41  47  59  76  65  57  57  58  50  47  56  52  79  40  70  47 119
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            121 122 123 124 125 126 127 128 129 130 131 132 133
## obs         94  83  70  70  74 126  89 104 109 120 141 121  49
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"
svymean(~log(monoEthyl), design = schild, na.rm = TRUE)
##                  mean     SE
## log(monoEthyl) 3.9152 0.0329

log(monoEthyl) level for young adult

syadult <- subset(nhc,age == "young adult")
summary(syadult)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "young adult")
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 4.971e-06 2.032e-05 3.711e-05 5.714e-05 5.743e-05 7.468e-04 
## Stratum Sizes: 
##            14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs         7 19 23 23 15 36 17 17 18 20 20 24 16 23 19 18 15 22 16 13 21 17 20
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs        19 23  7 26 20 28 23 17 13 24 19 17 32  9 15 29 14 25 11 21 25 12 11
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs        12 15 18 11 21 13 15 21 22 15  5 10 23 10  6 20 14 18 31 23 23 17 19
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs        20 22 11 34 12  9 12 26 24 20 16 25 15 23 16 27  8  11  46  15   7
## design.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
## actual.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
##            104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs         13  19  22  12  17  18  10  12  22  17  23  19  15   9  20  13  13
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            121 122 123 124 125 126 127 128 129 130 131 132 133
## obs         17  15   8  10  18  19   9   8  13   7  16  21  19
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"
svymean(~log(monoEthyl), design = syadult, na.rm = TRUE)
##                  mean    SE
## log(monoEthyl) 4.3766 0.053

log(monoEthyl) level for middle-aged

smid <- subset(nhc,age == "middle-aged")
summary(smid)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "middle-aged")
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 4.278e-06 1.553e-05 3.661e-05 4.409e-05 5.261e-05 7.468e-04 
## Stratum Sizes: 
##            14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  34 35
## obs        69 79 62 87 80 80 59 89 65 89 65 64 75 79 54 62 80 69 61 68 101 60
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2   2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2   2  2
##            36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
## obs        63 63 60 59 55 59 57 41 93 73 64 48 53 77 89 55 81 56 65 63 60 56 48
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            59 60  61 62 63 64 65 66  67 68 69 70 71 72 73 74  75 76  77  78 79
## obs        77 81 106 79 99 87 92 74 109 91 90 53 58 61 43 35 108 91 112 106 97
## design.PSU  2  2   2  2  2  2  2  2   2  2  2  2  2  2  2  2   2  2   2   2  2
## actual.PSU  2  2   2  2  2  2  2  2   2  2  2  2  2  2  2  2   2  2   2   2  2
##            80 81 82 83 84 85 86 87 88 89  90  91  92 93 94  95 96 97 98 99 100
## obs        84 82 84 88 89 71 95 64 65 39 102 131 111 73 61 105 74 70 73 83  87
## design.PSU  2  2  2  2  2  2  3  2  2  2   3   3   3  2  2   2  2  2  2  2   2
## actual.PSU  2  2  2  2  2  2  3  2  2  2   3   3   3  2  2   2  2  2  2  2   2
##            101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## obs         86  65  34  90  73  76  78  88  66  87  93  96  80  87  85  78  69
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
## obs         87  58  77  75  83  69  77  76  73  87  79  89  91  74  91  76
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"
svymean(~log(monoEthyl), design = smid, na.rm = TRUE)
##                  mean     SE
## log(monoEthyl) 4.2507 0.0303

log(monoEthyl) level for older adult

soadult <- subset(nhc,age == "older adult")
summary(soadult)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "older adult")
## Probabilities:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 4.675e-06 2.679e-05 4.917e-05 6.462e-05 8.518e-05 5.199e-04 
## Stratum Sizes: 
##            14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs        33 22 21 30 18 35 30 42 27 40 18 18 18 49 13 37 43 40 21 26 65 42 17
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs        19 27 31 18  8 17 20 38 22 40 38 16 37 36 12 16 14 21 23 16 10 16 44
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs        64 50 42 33 28 13 34 56 29 16 22 18 29 13  4 32 47 52 23 21 37 55 22
## design.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## actual.PSU  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
##            83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs        31 22 28 34 24 24  8 46 30 29 30 30 47 18 17 37 27  26  34  13  11
## design.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
## actual.PSU  2  2  2  3  2  2  2  3  3  3  2  2  2  2  2  2  2   2   2   2   2
##            104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs         50  32  25  18  30  43  17  16  43  38  36  25  26  22  21  26  18
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
##            121 122 123 124 125 126 127 128 129 130 131 132 133
## obs         18  16  24  22  42  25  38  30  19  40  30  37  23
## design.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## actual.PSU   2   2   2   2   2   2   2   2   2   2   2   2   2
## Data variables:
##  [1] "...1"        "year"        "WTINT2YR"    "SDMVPSU"     "SDMVSTRA"   
##  [6] "DMDEDUC3"    "DMDEDUC2"    "DMDHREDU"    "DMDHSEDU"    "RIDAGEYR"   
## [11] "RIAGENDR"    "RIDRETH1"    "INDFMPIR"    "DMDYRSUS"    "DMDCITZN"   
## [16] "URXMEP"      "fpl"         "age"         "gender"      "persWeight" 
## [21] "psu"         "strata"      "refED"       "refEDspouse" "childED"    
## [26] "adultED"     "ethnicity"   "citizenship" "yearsUS"     "monoEthyl"
svymean(~log(monoEthyl), design = soadult, na.rm = TRUE)
##                  mean     SE
## log(monoEthyl) 4.1572 0.0451

highest level of phthlates for young adult category

Models

A wide variety of statistical models can be run with complex survey data.

With only a few exceptions, the results of these analyses can be interpreted just as the results from the same analyses with experimental or quasi-experimental data.

For example, if you run an OLS regression with weighted data, assuming that the sampling plan has been correctly specified, the regression coefficients are interpreted exactly as any other OLS regression coefficient.

The same is true for the various logistic regression models, including binary logistic regression, ordinal logistic regression and multinomial logistic regression (of which there is not an example in this workshop).

Most of the assumptions of these models are also the same. However, some assumptions, such as the assumption regarding the normality of the residuals in OLS regression, are often not meaningful because of the large sample size commonly seen with complex survey data.

t tests

svyttest(log(monoEthyl)~0, nhc, na = TRUE)
## 
##  Design-based one-sample t-test
## 
## data:  log(monoEthyl) ~ 0
## t = 173.02, df = 123, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  4.134424 4.230118
## sample estimates:
##     mean 
## 4.182271

Independent-samples t-test.

svyttest(log(monoEthyl)~refED, nhc)
## 
##  Design-based t-test
## 
## data:  log(monoEthyl) ~ refED
## t = 9.2962, df = 123, p-value = 6.764e-16
## alternative hypothesis: true difference in mean is not equal to 0
## 95 percent confidence interval:
##  0.3289704 0.5069672
## sample estimates:
## difference in mean 
##          0.4179688

As you probably know, an independent-samples t-test tests the null hypothesis that the difference in the means of the two groups is 0. Another way to think about this type of t-test is to think of it as a linear regression with a single binary predictor. The intercept will be the mean of the reference group, and the coefficient will be the difference between the two groups.

We will start by running the t-test function as before, and then replicate the results using the svyglm function, which can be used to run a linear regression. The svyby function is used with the covmat argument to save the elements to a matrix so that we can use the svycontrast function to subtract the values.

The purpose of this example is not to belabor the point about a t-test, but rather to show how to get a matrix of values and then compare those values with the svycontrast function in a simple example where the answer is already known.

svyttest(RIDAGEYR~gender, nhc)
## 
##  Design-based t-test
## 
## data:  RIDAGEYR ~ gender
## t = -4.4888, df = 123, p-value = 1.625e-05
## alternative hypothesis: true difference in mean is not equal to 0
## 95 percent confidence interval:
##  -2.1819236 -0.8464863
## sample estimates:
## difference in mean 
##          -1.514205
summary(svyglm(RIDAGEYR~gender, design=nhc))
## 
## Call:
## svyglm(formula = RIDAGEYR ~ gender, design = nhc)
## 
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.4808     0.2937 134.421  < 2e-16 ***
## gendermale   -1.5142     0.3373  -4.489 1.62e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 436.5408)
## 
## Number of Fisher Scoring iterations: 2
a <- svyby(~RIDAGEYR, ~gender, nhc, na.rm.by = TRUE, svymean, covmat = TRUE)
vcov(a)
##            female       male
## female 0.08626588 0.03923044
## male   0.03923044 0.10598469
svycontrast(a, c( -1, 1))
##          contrast     SE
## contrast  -1.5142 0.3373

Multiple linear regression

We need to use the summary function to get the standard errors, test statistics and p-values. Let’s start with a model that has no interaction terms.
The outcome variable will be monoEthyl, and the predictors will be age and refED

summary(svyglm(log(monoEthyl)~age+refED, design=nhc, na.action = na.omit))
## 
## Call:
## svyglm(formula = log(monoEthyl) ~ age + refED, design = nhc, 
##     na.action = na.omit)
## 
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     3.59738    0.04730  76.059  < 2e-16 ***
## agemiddle-aged                  0.36360    0.03096  11.745  < 2e-16 ***
## ageolder adult                  0.24964    0.05355   4.662 8.19e-06 ***
## ageyoung adult                  0.44357    0.05550   7.992 9.21e-13 ***
## refEDpartial college and below  0.42697    0.04492   9.504 2.61e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.444453)
## 
## Number of Fisher Scoring iterations: 2

Interaction

Now let’s add an interaction between the two predictor variables, age and reference person education

summary(svyglm(log(monoEthyl)~age*refED, design=nhc, na.action = na.omit))
## 
## Call:
## svyglm(formula = log(monoEthyl) ~ age * refED, design = nhc, 
##     na.action = na.omit)
## 
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                    3.56852    0.06150  58.022
## agemiddle-aged                                 0.40919    0.06808   6.011
## ageolder adult                                 0.23411    0.09114   2.569
## ageyoung adult                                 0.48141    0.11815   4.074
## refEDpartial college and below                 0.46565    0.06444   7.226
## agemiddle-aged:refEDpartial college and below -0.06306    0.07855  -0.803
## ageolder adult:refEDpartial college and below  0.02168    0.09871   0.220
## ageyoung adult:refEDpartial college and below -0.04994    0.12388  -0.403
##                                               Pr(>|t|)    
## (Intercept)                                    < 2e-16 ***
## agemiddle-aged                                2.15e-08 ***
## ageolder adult                                  0.0115 *  
## ageyoung adult                                8.43e-05 ***
## refEDpartial college and below                5.48e-11 ***
## agemiddle-aged:refEDpartial college and below   0.4237    
## ageolder adult:refEDpartial college and below   0.8266    
## ageyoung adult:refEDpartial college and below   0.6876    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.444224)
## 
## Number of Fisher Scoring iterations: 2
glm1 <- (svyglm(log(monoEthyl)~gender+refED, design=nhc, na.action = na.omit))
glm1
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ gender + refED, design = nhc, 
##     na.action = na.omit)
## 
## Coefficients:
##                    (Intercept)                      gendermale  
##                        3.89173                        -0.01296  
## refEDpartial college and below  
##                        0.41771  
## 
## Degrees of Freedom: 20686 Total (i.e. Null);  122 Residual
##   (1662 observations deleted due to missingness)
## Null Deviance:       51790 
## Residual Deviance: 51050     AIC: 84630

This example is just like the previous one, only here factor notation is used. This is important when the categorical predictor has more than two levels.

summary(svyglm(log(monoEthyl)~factor(gender)+factor(ethnicity), design=nhc, na.action = na.omit))
## 
## Call:
## svyglm(formula = log(monoEthyl) ~ factor(gender) + factor(ethnicity), 
##     design = nhc, na.action = na.omit)
## 
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          4.393113   0.049860  88.109  < 2e-16 ***
## factor(gender)male                  -0.009393   0.027889  -0.337   0.7368    
## factor(ethnicity)Non-Hispanic Black  0.513229   0.065652   7.817 2.41e-12 ***
## factor(ethnicity)Non-Hispanic White -0.354993   0.061970  -5.728 7.77e-08 ***
## factor(ethnicity)Other Hispanic      0.141275   0.074951   1.885   0.0619 .  
## factor(ethnicity)Other or Multi     -0.631021   0.083302  -7.575 8.53e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.404013)
## 
## Number of Fisher Scoring iterations: 2

Forest Models

Forest model for Model A:

multivariate regression analysis of socio-demographic variables and phthalates

modela <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

summ(modela)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.08    49.10   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.08   0.00
## ageyoung adult                     0.44   0.05     8.16   0.00
## gendermale                        -0.01   0.03    -0.17   0.86
## ethnicityNon-Hispanic              0.57   0.07     8.37   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.84   0.00
## fplfamily income 2x poverty        0.03   0.04     0.70   0.48
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.05     1.10   0.27
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.83   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.87   0.06
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.55   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.91   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modela, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(modela, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.08    49.10   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.08   0.00
## ageyoung adult                     0.44   0.05     8.16   0.00
## gendermale                        -0.01   0.03    -0.17   0.86
## ethnicityNon-Hispanic              0.57   0.07     8.37   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.84   0.00
## fplfamily income 2x poverty        0.03   0.04     0.70   0.48
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.05     1.10   0.27
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.83   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.87   0.06
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.55   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.91   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modela, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.090 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.714    3.564    3.864   49.101   0.000
## refEDpartial college and           0.355    0.264    0.445    7.767   0.000
## below                                                                      
## agemiddle-aged                     0.363    0.296    0.430   10.776   0.000
## ageolder adult                     0.350    0.236    0.464    6.078   0.000
## ageyoung adult                     0.443    0.335    0.551    8.156   0.000
## gendermale                        -0.005   -0.063    0.053   -0.174   0.862
## ethnicityNon-Hispanic              0.569    0.434    0.704    8.365   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.287   -0.415   -0.159   -4.455   0.000
## White                                                                      
## ethnicityOther Hispanic            0.146   -0.004    0.297    1.923   0.057
## ethnicityOther or Multi           -0.566   -0.730   -0.402   -6.839   0.000
## fplfamily income 2x poverty        0.031   -0.056    0.118    0.701   0.485
## threshold                                                                  
## fplfamily income 3x poverty        0.061   -0.048    0.170    1.105   0.272
## threshold                                                                  
## fplfamily income 4x poverty        0.108   -0.009    0.224    1.830   0.070
## threshold                                                                  
## fplfamily income 5x poverty        0.118   -0.007    0.243    1.867   0.065
## threshold                                                                  
## fplfamily income more than         0.095   -0.027    0.216    1.545   0.125
## 5x poverty threshold                                                       
## citizenshipnot U,S,                0.169    0.054    0.283    2.911   0.004
## citizen                                                                    
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.372
summ(modela, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.71    3.56    3.86    49.10
## refEDpartial college and           0.35    0.26    0.45     7.77
## below                                                           
## agemiddle-aged                     0.36    0.30    0.43    10.78
## ageolder adult                     0.35    0.24    0.46     6.08
## ageyoung adult                     0.44    0.34    0.55     8.16
## gendermale                        -0.01   -0.06    0.05    -0.17
## ethnicityNon-Hispanic              0.57    0.43    0.70     8.37
## Black                                                           
## ethnicityNon-Hispanic             -0.29   -0.42   -0.16    -4.45
## White                                                           
## ethnicityOther Hispanic            0.15   -0.00    0.30     1.92
## ethnicityOther or Multi           -0.57   -0.73   -0.40    -6.84
## fplfamily income 2x poverty        0.03   -0.06    0.12     0.70
## threshold                                                       
## fplfamily income 3x poverty        0.06   -0.05    0.17     1.10
## threshold                                                       
## fplfamily income 4x poverty        0.11   -0.01    0.22     1.83
## threshold                                                       
## fplfamily income 5x poverty        0.12   -0.01    0.24     1.87
## threshold                                                       
## fplfamily income more than         0.09   -0.03    0.22     1.55
## 5x poverty threshold                                            
## citizenshipnot U,S,                0.17    0.05    0.28     2.91
## citizen                                                         
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modela)

plot_summs(modela, robust = TRUE)

plot_summs(modela, inner_ci_level = .9)

# plot coefficient uncertainty as normal distributions
plot_summs(modela, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modela, scale = TRUE)
Model 1
(Intercept)3.71 ***
(0.08)   
refEDpartial college and below0.35 ***
(0.05)   
agemiddle-aged0.36 ***
(0.03)   
ageolder adult0.35 ***
(0.06)   
ageyoung adult0.44 ***
(0.05)   
gendermale-0.01    
(0.03)   
ethnicityNon-Hispanic Black0.57 ***
(0.07)   
ethnicityNon-Hispanic White-0.29 ***
(0.06)   
ethnicityOther Hispanic0.15    
(0.08)   
ethnicityOther or Multi-0.57 ***
(0.08)   
fplfamily income 2x poverty threshold0.03    
(0.04)   
fplfamily income 3x poverty threshold0.06    
(0.05)   
fplfamily income 4x poverty threshold0.11    
(0.06)   
fplfamily income 5x poverty threshold0.12    
(0.06)   
fplfamily income more than 5x poverty threshold0.09    
(0.06)   
citizenshipnot U,S, citizen0.17 ** 
(0.06)   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(modela, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.71 ***
[3.56, 3.86]   
refEDpartial college and below0.35 ***
[0.26, 0.45]   
agemiddle-aged0.36 ***
[0.30, 0.43]   
ageolder adult0.35 ***
[0.24, 0.46]   
ageyoung adult0.44 ***
[0.34, 0.55]   
gendermale-0.01    
[-0.06, 0.05]   
ethnicityNon-Hispanic Black0.57 ***
[0.43, 0.70]   
ethnicityNon-Hispanic White-0.29 ***
[-0.42, -0.16]   
ethnicityOther Hispanic0.15    
[-0.00, 0.30]   
ethnicityOther or Multi-0.57 ***
[-0.73, -0.40]   
fplfamily income 2x poverty threshold0.03    
[-0.06, 0.12]   
fplfamily income 3x poverty threshold0.06    
[-0.05, 0.17]   
fplfamily income 4x poverty threshold0.11    
[-0.01, 0.22]   
fplfamily income 5x poverty threshold0.12    
[-0.01, 0.24]   
fplfamily income more than 5x poverty threshold0.09    
[-0.03, 0.22]   
citizenshipnot U,S, citizen0.17 ** 
[0.05, 0.28]   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

renaming variables in the forest plot

foresta <- plot_summs(
        point.size = 3,
        fontsize=8,
        colors = "darkseagreen3",
        modela, coefs = c("Household Education Partial College and Below
                                 College and Beyond (ref)" = "refEDpartial college and below", 
                          
                                 "Age: Middle-Aged
                                 Child (ref)" = "agemiddle-aged",
                          
                                 "Age: Older Adult
                                 Child (ref)" = "ageolder adult",
                          
                                 "Age: Young Adult
                                 Child (ref)" = "ageyoung adult",
                          
                                 "Gender: Male
                                 Gender: Female (ref)" = "gendermale",
                          
                                 "Ethnicity: Non-Hispanic Black
                                 Mexican American (ref)" = "ethnicityNon-Hispanic Black",
                          
                                 "Ethnicity: Non-Hispanic White
                                 Mexican American (ref)" = "ethnicityNon-Hispanic White",
                          
                                 "Ethnicity: Other Hispanic
                                 Mexican American (ref)" = "ethnicityOther Hispanic",
                          
                                 "Ethnicity: Other or Multi
                                 Mexican American (ref)" = "ethnicityOther or Multi",
                          
                                 "Family Income to Poverty Ratio: 2x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 3x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 4x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: more than 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold",
                          
                                 "Citizenship Status: Not U.S. Citizen
                                 U.S. Citizen by birth or naturalization (ref)" = "citizenshipnot U,S, citizen"),
        
                          scale = TRUE, robust = TRUE)

foresta

Forest Model for Model B:

take out gender
modelb <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

summ(modelb)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.07    51.36   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.09   0.00
## ageyoung adult                     0.44   0.05     8.15   0.00
## ethnicityNon-Hispanic              0.57   0.07     8.39   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.83   0.00
## fplfamily income 2x poverty        0.03   0.04     0.69   0.49
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.06     1.09   0.28
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.82   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.85   0.07
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.53   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.90   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modelb, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(modelb, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.07    51.36   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.09   0.00
## ageyoung adult                     0.44   0.05     8.15   0.00
## ethnicityNon-Hispanic              0.57   0.07     8.39   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.83   0.00
## fplfamily income 2x poverty        0.03   0.04     0.69   0.49
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.06     1.09   0.28
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.82   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.85   0.07
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.53   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.90   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modelb, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.080 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.712    3.569    3.855   51.356   0.000
## refEDpartial college and           0.355    0.264    0.445    7.765   0.000
## below                                                                      
## agemiddle-aged                     0.363    0.296    0.430   10.782   0.000
## ageolder adult                     0.350    0.236    0.464    6.089   0.000
## ageyoung adult                     0.443    0.335    0.551    8.155   0.000
## ethnicityNon-Hispanic              0.570    0.435    0.704    8.388   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.287   -0.415   -0.159   -4.452   0.000
## White                                                                      
## ethnicityOther Hispanic            0.146   -0.004    0.297    1.923   0.057
## ethnicityOther or Multi           -0.566   -0.730   -0.402   -6.834   0.000
## fplfamily income 2x poverty        0.031   -0.057    0.118    0.693   0.490
## threshold                                                                  
## fplfamily income 3x poverty        0.060   -0.049    0.170    1.094   0.277
## threshold                                                                  
## fplfamily income 4x poverty        0.107   -0.009    0.224    1.822   0.071
## threshold                                                                  
## fplfamily income 5x poverty        0.118   -0.008    0.244    1.854   0.066
## threshold                                                                  
## fplfamily income more than         0.094   -0.028    0.216    1.533   0.128
## 5x poverty threshold                                                       
## citizenshipnot U,S,                0.168    0.053    0.283    2.899   0.005
## citizen                                                                    
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.372
summ(modelb, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.71    3.57    3.86    51.36
## refEDpartial college and           0.35    0.26    0.45     7.77
## below                                                           
## agemiddle-aged                     0.36    0.30    0.43    10.78
## ageolder adult                     0.35    0.24    0.46     6.09
## ageyoung adult                     0.44    0.34    0.55     8.15
## ethnicityNon-Hispanic              0.57    0.44    0.70     8.39
## Black                                                           
## ethnicityNon-Hispanic             -0.29   -0.42   -0.16    -4.45
## White                                                           
## ethnicityOther Hispanic            0.15   -0.00    0.30     1.92
## ethnicityOther or Multi           -0.57   -0.73   -0.40    -6.83
## fplfamily income 2x poverty        0.03   -0.06    0.12     0.69
## threshold                                                       
## fplfamily income 3x poverty        0.06   -0.05    0.17     1.09
## threshold                                                       
## fplfamily income 4x poverty        0.11   -0.01    0.22     1.82
## threshold                                                       
## fplfamily income 5x poverty        0.12   -0.01    0.24     1.85
## threshold                                                       
## fplfamily income more than         0.09   -0.03    0.22     1.53
## 5x poverty threshold                                            
## citizenshipnot U,S,                0.17    0.05    0.28     2.90
## citizen                                                         
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modelb)

plot_summs(modelb, inner_ci_level = .9)

plot_summs(modelb, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(modelb, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modelb, scale = TRUE)
Model 1
(Intercept)3.71 ***
(0.07)   
refEDpartial college and below0.35 ***
(0.05)   
agemiddle-aged0.36 ***
(0.03)   
ageolder adult0.35 ***
(0.06)   
ageyoung adult0.44 ***
(0.05)   
ethnicityNon-Hispanic Black0.57 ***
(0.07)   
ethnicityNon-Hispanic White-0.29 ***
(0.06)   
ethnicityOther Hispanic0.15    
(0.08)   
ethnicityOther or Multi-0.57 ***
(0.08)   
fplfamily income 2x poverty threshold0.03    
(0.04)   
fplfamily income 3x poverty threshold0.06    
(0.06)   
fplfamily income 4x poverty threshold0.11    
(0.06)   
fplfamily income 5x poverty threshold0.12    
(0.06)   
fplfamily income more than 5x poverty threshold0.09    
(0.06)   
citizenshipnot U,S, citizen0.17 ** 
(0.06)   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(modelb, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.71 ***
[3.57, 3.86]   
refEDpartial college and below0.35 ***
[0.26, 0.45]   
agemiddle-aged0.36 ***
[0.30, 0.43]   
ageolder adult0.35 ***
[0.24, 0.46]   
ageyoung adult0.44 ***
[0.34, 0.55]   
ethnicityNon-Hispanic Black0.57 ***
[0.44, 0.70]   
ethnicityNon-Hispanic White-0.29 ***
[-0.42, -0.16]   
ethnicityOther Hispanic0.15    
[-0.00, 0.30]   
ethnicityOther or Multi-0.57 ***
[-0.73, -0.40]   
fplfamily income 2x poverty threshold0.03    
[-0.06, 0.12]   
fplfamily income 3x poverty threshold0.06    
[-0.05, 0.17]   
fplfamily income 4x poverty threshold0.11    
[-0.01, 0.22]   
fplfamily income 5x poverty threshold0.12    
[-0.01, 0.24]   
fplfamily income more than 5x poverty threshold0.09    
[-0.03, 0.22]   
citizenshipnot U,S, citizen0.17 ** 
[0.05, 0.28]   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

renaming variables in the forest plot

forestb <- plot_summs(
        point.size = 3,
        fontsize=8,
        colors = "darkslateblue",
        modela, coefs = c("Household Education Partial College and Below
                                 College and Beyond (ref)" = "refEDpartial college and below", 
                          
                                 "Age: Middle-Aged
                                 Child (ref)" = "agemiddle-aged",
                          
                                 "Age: Older Adult
                                 Child (ref)" = "ageolder adult",
                          
                                 "Age: Young Adult
                                 Child (ref)" = "ageyoung adult",
                          
                                 "Ethnicity: Non-Hispanic Black
                                 Mexican American (ref)" = "ethnicityNon-Hispanic Black",
                          
                                 "Ethnicity: Non-Hispanic White
                                 Mexican American (ref)" = "ethnicityNon-Hispanic White",
                          
                                 "Ethnicity: Other Hispanic
                                 Mexican American (ref)" = "ethnicityOther Hispanic",
                          
                                 "Ethnicity: Other or Multi
                                 Mexican American (ref)" = "ethnicityOther or Multi",
                          
                                 "Family Income to Poverty Ratio: 2x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 3x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 4x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: more than 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold",
                          
                                 "Citizenship Status: Not U.S. Citizen
                                 U.S. Citizen by birth or naturalization (ref)" = "citizenshipnot U,S, citizen"),
        
                          scale = TRUE, robust = TRUE)

forestb

Model C

take out gender, citizenship
modelc <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)

summ(modelc)
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.76   0.07    55.26   0.00
## refEDpartial college and           0.35   0.05     7.68   0.00
## below                                                         
## agemiddle-aged                     0.37   0.03    11.28   0.00
## ageolder adult                     0.36   0.06     6.16   0.00
## ageyoung adult                     0.46   0.05     8.42   0.00
## ethnicityNon-Hispanic              0.52   0.07     7.97   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.33   0.06    -5.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.13   0.07     1.79   0.08
## ethnicityOther or Multi           -0.58   0.08    -7.17   0.00
## fplfamily income 2x poverty        0.03   0.04     0.63   0.53
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.05     1.02   0.31
## threshold                                                     
## fplfamily income 4x poverty        0.10   0.06     1.69   0.09
## threshold                                                     
## fplfamily income 5x poverty        0.11   0.06     1.72   0.09
## threshold                                                     
## fplfamily income more than         0.08   0.06     1.38   0.17
## 5x poverty threshold                                          
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modelc, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(modelc, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.76   0.07    55.26   0.00
## refEDpartial college and           0.35   0.05     7.68   0.00
## below                                                         
## agemiddle-aged                     0.37   0.03    11.28   0.00
## ageolder adult                     0.36   0.06     6.16   0.00
## ageyoung adult                     0.46   0.05     8.42   0.00
## ethnicityNon-Hispanic              0.52   0.07     7.97   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.33   0.06    -5.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.13   0.07     1.79   0.08
## ethnicityOther or Multi           -0.58   0.08    -7.17   0.00
## fplfamily income 2x poverty        0.03   0.04     0.63   0.53
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.05     1.02   0.31
## threshold                                                     
## fplfamily income 4x poverty        0.10   0.06     1.69   0.09
## threshold                                                     
## fplfamily income 5x poverty        0.11   0.06     1.72   0.09
## threshold                                                     
## fplfamily income more than         0.08   0.06     1.38   0.17
## 5x poverty threshold                                          
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(modelc, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.059
## Adj. R² = -0.071 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.763    3.628    3.898   55.256   0.000
## refEDpartial college and           0.351    0.260    0.441    7.680   0.000
## below                                                                      
## agemiddle-aged                     0.375    0.309    0.440   11.283   0.000
## ageolder adult                     0.356    0.242    0.471    6.158   0.000
## ageyoung adult                     0.455    0.348    0.562    8.415   0.000
## ethnicityNon-Hispanic              0.524    0.394    0.654    7.975   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.335   -0.456   -0.213   -5.455   0.000
## White                                                                      
## ethnicityOther Hispanic            0.134   -0.014    0.283    1.795   0.075
## ethnicityOther or Multi           -0.582   -0.742   -0.421   -7.175   0.000
## fplfamily income 2x poverty        0.028   -0.060    0.115    0.626   0.532
## threshold                                                                  
## fplfamily income 3x poverty        0.056   -0.052    0.164    1.024   0.308
## threshold                                                                  
## fplfamily income 4x poverty        0.098   -0.017    0.213    1.687   0.094
## threshold                                                                  
## fplfamily income 5x poverty        0.108   -0.017    0.232    1.717   0.089
## threshold                                                                  
## fplfamily income more than         0.084   -0.037    0.205    1.379   0.171
## 5x poverty threshold                                                       
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.374
summ(modelc, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.76    3.63    3.90    55.26
## refEDpartial college and           0.35    0.26    0.44     7.68
## below                                                           
## agemiddle-aged                     0.37    0.31    0.44    11.28
## ageolder adult                     0.36    0.24    0.47     6.16
## ageyoung adult                     0.46    0.35    0.56     8.42
## ethnicityNon-Hispanic              0.52    0.39    0.65     7.97
## Black                                                           
## ethnicityNon-Hispanic             -0.33   -0.46   -0.21    -5.45
## White                                                           
## ethnicityOther Hispanic            0.13   -0.01    0.28     1.79
## ethnicityOther or Multi           -0.58   -0.74   -0.42    -7.17
## fplfamily income 2x poverty        0.03   -0.06    0.11     0.63
## threshold                                                       
## fplfamily income 3x poverty        0.06   -0.05    0.16     1.02
## threshold                                                       
## fplfamily income 4x poverty        0.10   -0.02    0.21     1.69
## threshold                                                       
## fplfamily income 5x poverty        0.11   -0.02    0.23     1.72
## threshold                                                       
## fplfamily income more than         0.08   -0.04    0.20     1.38
## 5x poverty threshold                                            
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modelc)

plot_summs(modelc, inner_ci_level = .9)

plot_summs(modelc, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(modelc, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modelc, scale = TRUE)
Model 1
(Intercept)3.76 ***
(0.07)   
refEDpartial college and below0.35 ***
(0.05)   
agemiddle-aged0.37 ***
(0.03)   
ageolder adult0.36 ***
(0.06)   
ageyoung adult0.46 ***
(0.05)   
ethnicityNon-Hispanic Black0.52 ***
(0.07)   
ethnicityNon-Hispanic White-0.33 ***
(0.06)   
ethnicityOther Hispanic0.13    
(0.07)   
ethnicityOther or Multi-0.58 ***
(0.08)   
fplfamily income 2x poverty threshold0.03    
(0.04)   
fplfamily income 3x poverty threshold0.06    
(0.05)   
fplfamily income 4x poverty threshold0.10    
(0.06)   
fplfamily income 5x poverty threshold0.11    
(0.06)   
fplfamily income more than 5x poverty threshold0.08    
(0.06)   
N19235       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(modelc, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.76 ***
[3.63, 3.90]   
refEDpartial college and below0.35 ***
[0.26, 0.44]   
agemiddle-aged0.37 ***
[0.31, 0.44]   
ageolder adult0.36 ***
[0.24, 0.47]   
ageyoung adult0.46 ***
[0.35, 0.56]   
ethnicityNon-Hispanic Black0.52 ***
[0.39, 0.65]   
ethnicityNon-Hispanic White-0.33 ***
[-0.46, -0.21]   
ethnicityOther Hispanic0.13    
[-0.01, 0.28]   
ethnicityOther or Multi-0.58 ***
[-0.74, -0.42]   
fplfamily income 2x poverty threshold0.03    
[-0.06, 0.11]   
fplfamily income 3x poverty threshold0.06    
[-0.05, 0.16]   
fplfamily income 4x poverty threshold0.10    
[-0.02, 0.21]   
fplfamily income 5x poverty threshold0.11    
[-0.02, 0.23]   
fplfamily income more than 5x poverty threshold0.08    
[-0.04, 0.20]   
N19235       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

renaming variables in the forest plot

forestc <- plot_summs(
        point.size = 3,
        fontsize=8,
        colors = "deepskyblue4",
        modela, coefs = c("Household Education Partial College and Below
                                 College and Beyond (ref)" = "refEDpartial college and below", 
                          
                                 "Age: Middle-Aged
                                 Child (ref)" = "agemiddle-aged",
                          
                                 "Age: Older Adult
                                 Child (ref)" = "ageolder adult",
                          
                                 "Age: Young Adult
                                 Child (ref)" = "ageyoung adult",
                          
                                 "Ethnicity: Non-Hispanic Black
                                 Mexican American (ref)" = "ethnicityNon-Hispanic Black",
                          
                                 "Ethnicity: Non-Hispanic White
                                 Mexican American (ref)" = "ethnicityNon-Hispanic White",
                          
                                 "Ethnicity: Other Hispanic
                                 Mexican American (ref)" = "ethnicityOther Hispanic",
                          
                                 "Ethnicity: Other or Multi
                                 Mexican American (ref)" = "ethnicityOther or Multi",
                          
                                 "Family Income to Poverty Ratio: 2x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 3x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 4x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
                          
                                 "Family Income to Poverty Ratio: more than 5x Poverty threshold
                                 At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold"),
        
                          scale = TRUE, robust = TRUE)

forestc

For “Results” on “Draft_21feb” google doc:

SUBSET: ADULTS

plus, add in “adultED”

subset_adult <- subset(nhc, RIDAGEYR > 19)

model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit)

summ(model_adult)
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13 
## 
## Standard errors: Robust
## ---------------------------------------------------------------
##                                     Est.   S.E.   t val.      p
## -------------------------------- ------- ------ -------- ------
## (Intercept)                         4.44   0.11    39.26   0.00
## refEDpartial college and            0.14   0.07     2.08   0.04
## below                                                          
## ageolder adult                     -0.02   0.05    -0.36   0.72
## ageyoung adult                      0.08   0.06     1.33   0.19
## gendermale                          0.03   0.04     0.94   0.35
## ethnicityNon-Hispanic               0.48   0.07     6.61   0.00
## Black                                                          
## ethnicityNon-Hispanic              -0.38   0.07    -5.48   0.00
## White                                                          
## ethnicityOther Hispanic             0.06   0.08     0.71   0.48
## ethnicityOther or Multi            -0.70   0.09    -7.60   0.00
## fplfamily income 2x poverty         0.07   0.05     1.35   0.18
## threshold                                                      
## fplfamily income 3x poverty         0.09   0.07     1.30   0.20
## threshold                                                      
## fplfamily income 4x poverty         0.15   0.07     2.10   0.04
## threshold                                                      
## fplfamily income 5x poverty         0.16   0.08     2.07   0.04
## threshold                                                      
## fplfamily income more than          0.18   0.07     2.52   0.01
## 5x poverty threshold                                           
## citizenshipnot U,S,                 0.15   0.07     2.17   0.03
## citizen                                                        
## adultEDcollege grad or             -0.39   0.08    -5.25   0.00
## above                                                          
## adultEDhigh school                 -0.06   0.06    -1.11   0.27
## grad/GED                                                       
## adultEDless than 9th grade         -0.19   0.07    -2.48   0.01
## adultEDsome college or AA          -0.16   0.07    -2.36   0.02
## ---------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
summ(model_adult, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_adult, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13 
## 
## Standard errors: Robust
## ---------------------------------------------------------------
##                                     Est.   S.E.   t val.      p
## -------------------------------- ------- ------ -------- ------
## (Intercept)                         4.44   0.11    39.26   0.00
## refEDpartial college and            0.14   0.07     2.08   0.04
## below                                                          
## ageolder adult                     -0.02   0.05    -0.36   0.72
## ageyoung adult                      0.08   0.06     1.33   0.19
## gendermale                          0.03   0.04     0.94   0.35
## ethnicityNon-Hispanic               0.48   0.07     6.61   0.00
## Black                                                          
## ethnicityNon-Hispanic              -0.38   0.07    -5.48   0.00
## White                                                          
## ethnicityOther Hispanic             0.06   0.08     0.71   0.48
## ethnicityOther or Multi            -0.70   0.09    -7.60   0.00
## fplfamily income 2x poverty         0.07   0.05     1.35   0.18
## threshold                                                      
## fplfamily income 3x poverty         0.09   0.07     1.30   0.20
## threshold                                                      
## fplfamily income 4x poverty         0.15   0.07     2.10   0.04
## threshold                                                      
## fplfamily income 5x poverty         0.16   0.08     2.07   0.04
## threshold                                                      
## fplfamily income more than          0.18   0.07     2.52   0.01
## 5x poverty threshold                                           
## citizenshipnot U,S,                 0.15   0.07     2.17   0.03
## citizen                                                        
## adultEDcollege grad or             -0.39   0.08    -5.25   0.00
## above                                                          
## adultEDhigh school                 -0.06   0.06    -1.11   0.27
## grad/GED                                                       
## adultEDless than 9th grade         -0.19   0.07    -2.48   0.01
## adultEDsome college or AA          -0.16   0.07    -2.36   0.02
## ---------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
summ(model_adult, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.057
## Adj. R² = -0.128 
## 
## Standard errors: Robust
## ----------------------------------------------------------------------------
##                                      Est.     2.5%    97.5%   t val.       p
## -------------------------------- -------- -------- -------- -------- -------
## (Intercept)                         4.444    4.219    4.668   39.259   0.000
## refEDpartial college and            0.141    0.007    0.276    2.081   0.040
## below                                                                       
## ageolder adult                     -0.020   -0.128    0.089   -0.359   0.720
## ageyoung adult                      0.077   -0.038    0.193    1.334   0.185
## gendermale                          0.034   -0.038    0.106    0.937   0.351
## ethnicityNon-Hispanic               0.481    0.337    0.626    6.610   0.000
## Black                                                                       
## ethnicityNon-Hispanic              -0.379   -0.516   -0.242   -5.475   0.000
## White                                                                       
## ethnicityOther Hispanic             0.060   -0.108    0.228    0.707   0.481
## ethnicityOther or Multi            -0.702   -0.885   -0.519   -7.598   0.000
## fplfamily income 2x poverty         0.073   -0.034    0.180    1.353   0.179
## threshold                                                                   
## fplfamily income 3x poverty         0.087   -0.045    0.220    1.303   0.195
## threshold                                                                   
## fplfamily income 4x poverty         0.146    0.008    0.284    2.103   0.038
## threshold                                                                   
## fplfamily income 5x poverty         0.158    0.007    0.308    2.075   0.040
## threshold                                                                   
## fplfamily income more than          0.178    0.038    0.318    2.520   0.013
## 5x poverty threshold                                                        
## citizenshipnot U,S,                 0.149    0.013    0.286    2.171   0.032
## citizen                                                                     
## adultEDcollege grad or             -0.394   -0.542   -0.245   -5.247   0.000
## above                                                                       
## adultEDhigh school                 -0.063   -0.176    0.049   -1.115   0.267
## grad/GED                                                                    
## adultEDless than 9th grade         -0.185   -0.333   -0.037   -2.477   0.015
## adultEDsome college or AA          -0.157   -0.288   -0.025   -2.357   0.020
## ----------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.488
summ(model_adult, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13 
## 
## Standard errors: Robust
## -----------------------------------------------------------------
##                                     Est.    2.5%   97.5%   t val.
## -------------------------------- ------- ------- ------- --------
## (Intercept)                         4.44    4.22    4.67    39.26
## refEDpartial college and            0.14    0.01    0.28     2.08
## below                                                            
## ageolder adult                     -0.02   -0.13    0.09    -0.36
## ageyoung adult                      0.08   -0.04    0.19     1.33
## gendermale                          0.03   -0.04    0.11     0.94
## ethnicityNon-Hispanic               0.48    0.34    0.63     6.61
## Black                                                            
## ethnicityNon-Hispanic              -0.38   -0.52   -0.24    -5.48
## White                                                            
## ethnicityOther Hispanic             0.06   -0.11    0.23     0.71
## ethnicityOther or Multi            -0.70   -0.89   -0.52    -7.60
## fplfamily income 2x poverty         0.07   -0.03    0.18     1.35
## threshold                                                        
## fplfamily income 3x poverty         0.09   -0.05    0.22     1.30
## threshold                                                        
## fplfamily income 4x poverty         0.15    0.01    0.28     2.10
## threshold                                                        
## fplfamily income 5x poverty         0.16    0.01    0.31     2.07
## threshold                                                        
## fplfamily income more than          0.18    0.04    0.32     2.52
## 5x poverty threshold                                             
## citizenshipnot U,S,                 0.15    0.01    0.29     2.17
## citizen                                                          
## adultEDcollege grad or             -0.39   -0.54   -0.24    -5.25
## above                                                            
## adultEDhigh school                 -0.06   -0.18    0.05    -1.11
## grad/GED                                                         
## adultEDless than 9th grade         -0.19   -0.33   -0.04    -2.48
## adultEDsome college or AA          -0.16   -0.29   -0.02    -2.36
## -----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
# THE GRAPH
plot_summs(model_adult)

plot_summs(model_adult, inner_ci_level = .9)

plot_summs(model_adult, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_adult, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_adult, scale = TRUE)
Model 1
(Intercept)4.44 ***
(0.11)   
refEDpartial college and below0.14 *  
(0.07)   
ageolder adult-0.02    
(0.05)   
ageyoung adult0.08    
(0.06)   
gendermale0.03    
(0.04)   
ethnicityNon-Hispanic Black0.48 ***
(0.07)   
ethnicityNon-Hispanic White-0.38 ***
(0.07)   
ethnicityOther Hispanic0.06    
(0.08)   
ethnicityOther or Multi-0.70 ***
(0.09)   
fplfamily income 2x poverty threshold0.07    
(0.05)   
fplfamily income 3x poverty threshold0.09    
(0.07)   
fplfamily income 4x poverty threshold0.15 *  
(0.07)   
fplfamily income 5x poverty threshold0.16 *  
(0.08)   
fplfamily income more than 5x poverty threshold0.18 *  
(0.07)   
citizenshipnot U,S, citizen0.15 *  
(0.07)   
adultEDcollege grad or above-0.39 ***
(0.08)   
adultEDhigh school grad/GED-0.06    
(0.06)   
adultEDless than 9th grade-0.19 *  
(0.07)   
adultEDsome college or AA-0.16 *  
(0.07)   
N12132       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_adult, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)4.44 ***
[4.22, 4.67]   
refEDpartial college and below0.14 *  
[0.01, 0.28]   
ageolder adult-0.02    
[-0.13, 0.09]   
ageyoung adult0.08    
[-0.04, 0.19]   
gendermale0.03    
[-0.04, 0.11]   
ethnicityNon-Hispanic Black0.48 ***
[0.34, 0.63]   
ethnicityNon-Hispanic White-0.38 ***
[-0.52, -0.24]   
ethnicityOther Hispanic0.06    
[-0.11, 0.23]   
ethnicityOther or Multi-0.70 ***
[-0.89, -0.52]   
fplfamily income 2x poverty threshold0.07    
[-0.03, 0.18]   
fplfamily income 3x poverty threshold0.09    
[-0.05, 0.22]   
fplfamily income 4x poverty threshold0.15 *  
[0.01, 0.28]   
fplfamily income 5x poverty threshold0.16 *  
[0.01, 0.31]   
fplfamily income more than 5x poverty threshold0.18 *  
[0.04, 0.32]   
citizenshipnot U,S, citizen0.15 *  
[0.01, 0.29]   
adultEDcollege grad or above-0.39 ***
[-0.54, -0.24]   
adultEDhigh school grad/GED-0.06    
[-0.18, 0.05]   
adultEDless than 9th grade-0.19 *  
[-0.33, -0.04]   
adultEDsome college or AA-0.16 *  
[-0.29, -0.02]   
N12132       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# check AIC of Model E and for interaction
subset_adult <- subset(nhc, RIDAGEYR > 19)

model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit)

ols_adult <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit))
ols_adult
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR > 19)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship + adultED, design = subset_adult, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         4.44356  
##                  refEDpartial college and below  
##                                         0.14119  
##                                  ageolder adult  
##                                        -0.01961  
##                                  ageyoung adult  
##                                         0.07748  
##                                      gendermale  
##                                         0.03390  
##                     ethnicityNon-Hispanic Black  
##                                         0.48139  
##                     ethnicityNon-Hispanic White  
##                                        -0.37902  
##                         ethnicityOther Hispanic  
##                                         0.05996  
##                         ethnicityOther or Multi  
##                                        -0.70217  
##           fplfamily income 2x poverty threshold  
##                                         0.07295  
##           fplfamily income 3x poverty threshold  
##                                         0.08713  
##           fplfamily income 4x poverty threshold  
##                                         0.14640  
##           fplfamily income 5x poverty threshold  
##                                         0.15751  
## fplfamily income more than 5x poverty threshold  
##                                         0.17778  
##                     citizenshipnot U,S, citizen  
##                                         0.14940  
##                    adultEDcollege grad or above  
##                                        -0.39358  
##                     adultEDhigh school grad/GED  
##                                        -0.06346  
##                      adultEDless than 9th grade  
##                                        -0.18504  
##                       adultEDsome college or AA  
##                                        -0.15667  
## 
## Degrees of Freedom: 12131 Total (i.e. Null);  106 Residual
##   (1965 observations deleted due to missingness)
## Null Deviance:       32000 
## Residual Deviance: 30180     AIC: 49030
# this gives an AIC of 49,030
# with gender, AIC is the same

# checking adultED*fpl
model_adult_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit)

ols_adult_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit))
ols_adult_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR > 19)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl * adultED + citizenship, design = subset_adult, na.action = na.omit)
## 
## Coefficients:
##                                                                  (Intercept)  
##                                                                      4.36122  
##                                               refEDpartial college and below  
##                                                                      0.13774  
##                                                               ageolder adult  
##                                                                     -0.01526  
##                                                               ageyoung adult  
##                                                                      0.06749  
##                                                                   gendermale  
##                                                                      0.03305  
##                                                  ethnicityNon-Hispanic Black  
##                                                                      0.47803  
##                                                  ethnicityNon-Hispanic White  
##                                                                     -0.38392  
##                                                      ethnicityOther Hispanic  
##                                                                      0.05513  
##                                                      ethnicityOther or Multi  
##                                                                     -0.71591  
##                                        fplfamily income 2x poverty threshold  
##                                                                      0.22130  
##                                        fplfamily income 3x poverty threshold  
##                                                                      0.24930  
##                                        fplfamily income 4x poverty threshold  
##                                                                      0.17751  
##                                        fplfamily income 5x poverty threshold  
##                                                                      0.58635  
##                              fplfamily income more than 5x poverty threshold  
##                                                                      0.06208  
##                                                 adultEDcollege grad or above  
##                                                                     -0.10661  
##                                                  adultEDhigh school grad/GED  
##                                                                      0.07886  
##                                                   adultEDless than 9th grade  
##                                                                     -0.14456  
##                                                    adultEDsome college or AA  
##                                                                     -0.05738  
##                                                  citizenshipnot U,S, citizen  
##                                                                      0.15350  
##           fplfamily income 2x poverty threshold:adultEDcollege grad or above  
##                                                                     -0.38960  
##           fplfamily income 3x poverty threshold:adultEDcollege grad or above  
##                                                                     -0.33654  
##           fplfamily income 4x poverty threshold:adultEDcollege grad or above  
##                                                                     -0.10297  
##           fplfamily income 5x poverty threshold:adultEDcollege grad or above  
##                                                                     -0.65937  
## fplfamily income more than 5x poverty threshold:adultEDcollege grad or above  
##                                                                     -0.11581  
##            fplfamily income 2x poverty threshold:adultEDhigh school grad/GED  
##                                                                     -0.29087  
##            fplfamily income 3x poverty threshold:adultEDhigh school grad/GED  
##                                                                     -0.23135  
##            fplfamily income 4x poverty threshold:adultEDhigh school grad/GED  
##                                                                     -0.06144  
##            fplfamily income 5x poverty threshold:adultEDhigh school grad/GED  
##                                                                     -0.59091  
##  fplfamily income more than 5x poverty threshold:adultEDhigh school grad/GED  
##                                                                      0.24831  
##             fplfamily income 2x poverty threshold:adultEDless than 9th grade  
##                                                                     -0.08852  
##             fplfamily income 3x poverty threshold:adultEDless than 9th grade  
##                                                                     -0.12357  
##             fplfamily income 4x poverty threshold:adultEDless than 9th grade  
##                                                                      0.13382  
##             fplfamily income 5x poverty threshold:adultEDless than 9th grade  
##                                                                      0.15090  
##   fplfamily income more than 5x poverty threshold:adultEDless than 9th grade  
##                                                                     -0.52742  
##              fplfamily income 2x poverty threshold:adultEDsome college or AA  
##                                                                     -0.09355  
##              fplfamily income 3x poverty threshold:adultEDsome college or AA  
##                                                                     -0.18532  
##              fplfamily income 4x poverty threshold:adultEDsome college or AA  
##                                                                     -0.13516  
##              fplfamily income 5x poverty threshold:adultEDsome college or AA  
##                                                                     -0.42478  
##    fplfamily income more than 5x poverty threshold:adultEDsome college or AA  
##                                                                      0.12183  
## 
## Degrees of Freedom: 12131 Total (i.e. Null);  86 Residual
##   (1965 observations deleted due to missingness)
## Null Deviance:       32000 
## Residual Deviance: 30100     AIC: 49040
# this gives an AIC of 49,040

SUBSET: CHILDREN

plus, add in “childED”

subset_child <- subset(nhc, RIDAGEYR <= 19)

# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)

# "model M" in document

model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl*childED+citizenship, design=subset_child, na.action = na.omit)

summ(model_child)
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.74   0.09    39.62   0.00
## refEDpartial college and           0.32   0.07     4.83   0.00
## below                                                         
## ageyoung adult                    -0.11   0.10    -1.08   0.28
## gendermale                        -0.18   0.04    -4.18   0.00
## ethnicityNon-Hispanic              0.67   0.07     9.30   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.24   0.07    -3.28   0.00
## White                                                         
## ethnicityOther Hispanic            0.29   0.12     2.36   0.02
## ethnicityOther or Multi           -0.24   0.11    -2.24   0.03
## childEDsecondary                   0.45   0.12     3.76   0.00
## fplfamily income 2x poverty       -0.07   0.06    -1.12   0.26
## threshold                                                     
## fplfamily income 3x poverty       -0.02   0.08    -0.26   0.80
## threshold                                                     
## fplfamily income 4x poverty        0.10   0.09     1.17   0.24
## threshold                                                     
## fplfamily income 5x poverty        0.04   0.11     0.39   0.70
## threshold                                                     
## fplfamily income more than        -0.19   0.10    -1.82   0.07
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.14   0.09     1.46   0.15
## citizen                                                       
## ethnicityNon-Hispanic             -0.06   0.13    -0.46   0.65
## Black:childEDsecondary                                        
## ethnicityNon-Hispanic              0.18   0.13     1.39   0.17
## White:childEDsecondary                                        
## ethnicityOther                    -0.07   0.20    -0.37   0.71
## Hispanic:childEDsecondary                                     
## ethnicityOther or                  0.01   0.17     0.07   0.95
## Multi:childEDsecondary                                        
## childEDsecondary:fplfamily         0.00   0.12     0.01   1.00
## income 2x poverty threshold                                   
## childEDsecondary:fplfamily         0.10   0.14     0.69   0.49
## income 3x poverty threshold                                   
## childEDsecondary:fplfamily        -0.07   0.17    -0.41   0.68
## income 4x poverty threshold                                   
## childEDsecondary:fplfamily         0.21   0.21     1.00   0.32
## income 5x poverty threshold                                   
## childEDsecondary:fplfamily         0.30   0.17     1.79   0.08
## income more than 5x poverty                                   
## threshold                                                     
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
summ(model_child, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_child, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.74   0.09    39.62   0.00
## refEDpartial college and           0.32   0.07     4.83   0.00
## below                                                         
## ageyoung adult                    -0.11   0.10    -1.08   0.28
## gendermale                        -0.18   0.04    -4.18   0.00
## ethnicityNon-Hispanic              0.67   0.07     9.30   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.24   0.07    -3.28   0.00
## White                                                         
## ethnicityOther Hispanic            0.29   0.12     2.36   0.02
## ethnicityOther or Multi           -0.24   0.11    -2.24   0.03
## childEDsecondary                   0.45   0.12     3.76   0.00
## fplfamily income 2x poverty       -0.07   0.06    -1.12   0.26
## threshold                                                     
## fplfamily income 3x poverty       -0.02   0.08    -0.26   0.80
## threshold                                                     
## fplfamily income 4x poverty        0.10   0.09     1.17   0.24
## threshold                                                     
## fplfamily income 5x poverty        0.04   0.11     0.39   0.70
## threshold                                                     
## fplfamily income more than        -0.19   0.10    -1.82   0.07
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.14   0.09     1.46   0.15
## citizen                                                       
## ethnicityNon-Hispanic             -0.06   0.13    -0.46   0.65
## Black:childEDsecondary                                        
## ethnicityNon-Hispanic              0.18   0.13     1.39   0.17
## White:childEDsecondary                                        
## ethnicityOther                    -0.07   0.20    -0.37   0.71
## Hispanic:childEDsecondary                                     
## ethnicityOther or                  0.01   0.17     0.07   0.95
## Multi:childEDsecondary                                        
## childEDsecondary:fplfamily         0.00   0.12     0.01   1.00
## income 2x poverty threshold                                   
## childEDsecondary:fplfamily         0.10   0.14     0.69   0.49
## income 3x poverty threshold                                   
## childEDsecondary:fplfamily        -0.07   0.17    -0.41   0.68
## income 4x poverty threshold                                   
## childEDsecondary:fplfamily         0.21   0.21     1.00   0.32
## income 5x poverty threshold                                   
## childEDsecondary:fplfamily         0.30   0.17     1.79   0.08
## income more than 5x poverty                                   
## threshold                                                     
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
summ(model_child, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.109
## Adj. R² = -0.111 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.736    3.549    3.923   39.624   0.000
## refEDpartial college and           0.322    0.189    0.454    4.827   0.000
## below                                                                      
## ageyoung adult                    -0.109   -0.310    0.092   -1.080   0.283
## gendermale                        -0.185   -0.273   -0.097   -4.182   0.000
## ethnicityNon-Hispanic              0.671    0.528    0.815    9.297   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.237   -0.380   -0.094   -3.283   0.001
## White                                                                      
## ethnicityOther Hispanic            0.289    0.046    0.532    2.363   0.020
## ethnicityOther or Multi           -0.245   -0.461   -0.028   -2.239   0.027
## childEDsecondary                   0.454    0.214    0.693    3.761   0.000
## fplfamily income 2x poverty       -0.065   -0.181    0.050   -1.124   0.264
## threshold                                                                  
## fplfamily income 3x poverty       -0.022   -0.188    0.145   -0.259   0.796
## threshold                                                                  
## fplfamily income 4x poverty        0.100   -0.069    0.269    1.170   0.245
## threshold                                                                  
## fplfamily income 5x poverty        0.044   -0.181    0.269    0.387   0.700
## threshold                                                                  
## fplfamily income more than        -0.188   -0.393    0.017   -1.823   0.071
## 5x poverty threshold                                                       
## citizenshipnot U,S,                0.138   -0.049    0.325    1.461   0.147
## citizen                                                                    
## ethnicityNon-Hispanic             -0.058   -0.310    0.193   -0.460   0.646
## Black:childEDsecondary                                                     
## ethnicityNon-Hispanic              0.175   -0.076    0.426    1.385   0.169
## White:childEDsecondary                                                     
## ethnicityOther                    -0.072   -0.462    0.317   -0.369   0.713
## Hispanic:childEDsecondary                                                  
## ethnicityOther or                  0.011   -0.323    0.346    0.068   0.946
## Multi:childEDsecondary                                                     
## childEDsecondary:fplfamily         0.001   -0.245    0.247    0.006   0.995
## income 2x poverty threshold                                                
## childEDsecondary:fplfamily         0.099   -0.186    0.385    0.689   0.492
## income 3x poverty threshold                                                
## childEDsecondary:fplfamily        -0.068   -0.397    0.261   -0.408   0.684
## income 4x poverty threshold                                                
## childEDsecondary:fplfamily         0.208   -0.205    0.621    0.999   0.320
## income 5x poverty threshold                                                
## childEDsecondary:fplfamily         0.297   -0.032    0.625    1.791   0.076
## income more than 5x poverty                                                
## threshold                                                                  
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.826
summ(model_child, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.74    3.55    3.92    39.62
## refEDpartial college and           0.32    0.19    0.45     4.83
## below                                                           
## ageyoung adult                    -0.11   -0.31    0.09    -1.08
## gendermale                        -0.18   -0.27   -0.10    -4.18
## ethnicityNon-Hispanic              0.67    0.53    0.81     9.30
## Black                                                           
## ethnicityNon-Hispanic             -0.24   -0.38   -0.09    -3.28
## White                                                           
## ethnicityOther Hispanic            0.29    0.05    0.53     2.36
## ethnicityOther or Multi           -0.24   -0.46   -0.03    -2.24
## childEDsecondary                   0.45    0.21    0.69     3.76
## fplfamily income 2x poverty       -0.07   -0.18    0.05    -1.12
## threshold                                                       
## fplfamily income 3x poverty       -0.02   -0.19    0.14    -0.26
## threshold                                                       
## fplfamily income 4x poverty        0.10   -0.07    0.27     1.17
## threshold                                                       
## fplfamily income 5x poverty        0.04   -0.18    0.27     0.39
## threshold                                                       
## fplfamily income more than        -0.19   -0.39    0.02    -1.82
## 5x poverty threshold                                            
## citizenshipnot U,S,                0.14   -0.05    0.33     1.46
## citizen                                                         
## ethnicityNon-Hispanic             -0.06   -0.31    0.19    -0.46
## Black:childEDsecondary                                          
## ethnicityNon-Hispanic              0.18   -0.08    0.43     1.39
## White:childEDsecondary                                          
## ethnicityOther                    -0.07   -0.46    0.32    -0.37
## Hispanic:childEDsecondary                                       
## ethnicityOther or                  0.01   -0.32    0.35     0.07
## Multi:childEDsecondary                                          
## childEDsecondary:fplfamily         0.00   -0.25    0.25     0.01
## income 2x poverty threshold                                     
## childEDsecondary:fplfamily         0.10   -0.19    0.38     0.69
## income 3x poverty threshold                                     
## childEDsecondary:fplfamily        -0.07   -0.40    0.26    -0.41
## income 4x poverty threshold                                     
## childEDsecondary:fplfamily         0.21   -0.20    0.62     1.00
## income 5x poverty threshold                                     
## childEDsecondary:fplfamily         0.30   -0.03    0.63     1.79
## income more than 5x poverty                                     
## threshold                                                       
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
# THE GRAPH
plot_summs(model_child)

plot_summs(model_child, inner_ci_level = .9)

plot_summs(model_child, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_child, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_child, scale = TRUE)
Model 1
(Intercept)3.74 ***
(0.09)   
refEDpartial college and below0.32 ***
(0.07)   
ageyoung adult-0.11    
(0.10)   
gendermale-0.18 ***
(0.04)   
ethnicityNon-Hispanic Black0.67 ***
(0.07)   
ethnicityNon-Hispanic White-0.24 ** 
(0.07)   
ethnicityOther Hispanic0.29 *  
(0.12)   
ethnicityOther or Multi-0.24 *  
(0.11)   
childEDsecondary0.45 ***
(0.12)   
fplfamily income 2x poverty threshold-0.07    
(0.06)   
fplfamily income 3x poverty threshold-0.02    
(0.08)   
fplfamily income 4x poverty threshold0.10    
(0.09)   
fplfamily income 5x poverty threshold0.04    
(0.11)   
fplfamily income more than 5x poverty threshold-0.19    
(0.10)   
citizenshipnot U,S, citizen0.14    
(0.09)   
ethnicityNon-Hispanic Black:childEDsecondary-0.06    
(0.13)   
ethnicityNon-Hispanic White:childEDsecondary0.18    
(0.13)   
ethnicityOther Hispanic:childEDsecondary-0.07    
(0.20)   
ethnicityOther or Multi:childEDsecondary0.01    
(0.17)   
childEDsecondary:fplfamily income 2x poverty threshold0.00    
(0.12)   
childEDsecondary:fplfamily income 3x poverty threshold0.10    
(0.14)   
childEDsecondary:fplfamily income 4x poverty threshold-0.07    
(0.17)   
childEDsecondary:fplfamily income 5x poverty threshold0.21    
(0.21)   
childEDsecondary:fplfamily income more than 5x poverty threshold0.30    
(0.17)   
N6619       
R20.11    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_child, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.74 ***
[3.55, 3.92]   
refEDpartial college and below0.32 ***
[0.19, 0.45]   
ageyoung adult-0.11    
[-0.31, 0.09]   
gendermale-0.18 ***
[-0.27, -0.10]   
ethnicityNon-Hispanic Black0.67 ***
[0.53, 0.81]   
ethnicityNon-Hispanic White-0.24 ** 
[-0.38, -0.09]   
ethnicityOther Hispanic0.29 *  
[0.05, 0.53]   
ethnicityOther or Multi-0.24 *  
[-0.46, -0.03]   
childEDsecondary0.45 ***
[0.21, 0.69]   
fplfamily income 2x poverty threshold-0.07    
[-0.18, 0.05]   
fplfamily income 3x poverty threshold-0.02    
[-0.19, 0.14]   
fplfamily income 4x poverty threshold0.10    
[-0.07, 0.27]   
fplfamily income 5x poverty threshold0.04    
[-0.18, 0.27]   
fplfamily income more than 5x poverty threshold-0.19    
[-0.39, 0.02]   
citizenshipnot U,S, citizen0.14    
[-0.05, 0.33]   
ethnicityNon-Hispanic Black:childEDsecondary-0.06    
[-0.31, 0.19]   
ethnicityNon-Hispanic White:childEDsecondary0.18    
[-0.08, 0.43]   
ethnicityOther Hispanic:childEDsecondary-0.07    
[-0.46, 0.32]   
ethnicityOther or Multi:childEDsecondary0.01    
[-0.32, 0.35]   
childEDsecondary:fplfamily income 2x poverty threshold0.00    
[-0.25, 0.25]   
childEDsecondary:fplfamily income 3x poverty threshold0.10    
[-0.19, 0.38]   
childEDsecondary:fplfamily income 4x poverty threshold-0.07    
[-0.40, 0.26]   
childEDsecondary:fplfamily income 5x poverty threshold0.21    
[-0.20, 0.62]   
childEDsecondary:fplfamily income more than 5x poverty threshold0.30    
[-0.03, 0.63]   
N6619       
R20.11    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
### check AIC of Model D for interaction
subset_child <- subset(nhc, RIDAGEYR <= 19)

model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)

ols_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)
ols_child 
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR <= 19)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship + childED, design = subset_child, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.68377  
##                  refEDpartial college and below  
##                                         0.32922  
##                                  ageyoung adult  
##                                        -0.12090  
##                                      gendermale  
##                                        -0.18140  
##                     ethnicityNon-Hispanic Black  
##                                         0.64594  
##                     ethnicityNon-Hispanic White  
##                                        -0.18883  
##                         ethnicityOther Hispanic  
##                                         0.26191  
##                         ethnicityOther or Multi  
##                                        -0.24214  
##           fplfamily income 2x poverty threshold  
##                                        -0.06454  
##           fplfamily income 3x poverty threshold  
##                                         0.00301  
##           fplfamily income 4x poverty threshold  
##                                         0.07541  
##           fplfamily income 5x poverty threshold  
##                                         0.10752  
## fplfamily income more than 5x poverty threshold  
##                                        -0.08216  
##                     citizenshipnot U,S, citizen  
##                                         0.12371  
##                                childEDsecondary  
##                                         0.62523  
## 
## Degrees of Freedom: 6618 Total (i.e. Null);  110 Residual
##   (1633 observations deleted due to missingness)
## Null Deviance:       13560 
## Residual Deviance: 12130     AIC: 24780
# this gives an AIC of 24,780
# (without gender, AIC of 24,800)


# checking childED*fpl
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=subset_child, na.action = na.omit)

ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl * childED + citizenship, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                                      (Intercept)  
##                                                        3.7316807  
##                                   refEDpartial college and below  
##                                                        0.3181659  
##                                                   ageyoung adult  
##                                                       -0.1059382  
##                                                       gendermale  
##                                                       -0.1854323  
##                                      ethnicityNon-Hispanic Black  
##                                                        0.6476323  
##                                      ethnicityNon-Hispanic White  
##                                                       -0.1826920  
##                                          ethnicityOther Hispanic  
##                                                        0.2657327  
##                                          ethnicityOther or Multi  
##                                                       -0.2422974  
##                            fplfamily income 2x poverty threshold  
##                                                       -0.0765838  
##                            fplfamily income 3x poverty threshold  
##                                                       -0.0468372  
##                            fplfamily income 4x poverty threshold  
##                                                        0.0735603  
##                            fplfamily income 5x poverty threshold  
##                                                        0.0106034  
##                  fplfamily income more than 5x poverty threshold  
##                                                       -0.2248540  
##                                                 childEDsecondary  
##                                                        0.5022878  
##                                      citizenshipnot U,S, citizen  
##                                                        0.1316535  
##           fplfamily income 2x poverty threshold:childEDsecondary  
##                                                        0.0243319  
##           fplfamily income 3x poverty threshold:childEDsecondary  
##                                                        0.1486179  
##           fplfamily income 4x poverty threshold:childEDsecondary  
##                                                       -0.0001807  
##           fplfamily income 5x poverty threshold:childEDsecondary  
##                                                        0.2895914  
## fplfamily income more than 5x poverty threshold:childEDsecondary  
##                                                        0.3903288  
## 
## Degrees of Freedom: 6618 Total (i.e. Null);  105 Residual
##   (15730 observations deleted due to missingness)
## Null Deviance:       8327 
## Residual Deviance: 7426  AIC: 24770
# this gives and AIC 24,770

# checking childED*ethnicity
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=subset_child, na.action = na.omit)

ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity * 
##     childED + fpl + citizenship, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.71648  
##                  refEDpartial college and below  
##                                         0.33079  
##                                  ageyoung adult  
##                                        -0.11898  
##                                      gendermale  
##                                        -0.18235  
##                     ethnicityNon-Hispanic Black  
##                                         0.67001  
##                     ethnicityNon-Hispanic White  
##                                        -0.26213  
##                         ethnicityOther Hispanic  
##                                         0.28426  
##                         ethnicityOther or Multi  
##                                        -0.25562  
##                                childEDsecondary  
##                                         0.48259  
##           fplfamily income 2x poverty threshold  
##                                        -0.06141  
##           fplfamily income 3x poverty threshold  
##                                         0.01397  
##           fplfamily income 4x poverty threshold  
##                                         0.07948  
##           fplfamily income 5x poverty threshold  
##                                         0.11532  
## fplfamily income more than 5x poverty threshold  
##                                        -0.07819  
##                     citizenshipnot U,S, citizen  
##                                         0.13473  
##    ethnicityNon-Hispanic Black:childEDsecondary  
##                                        -0.05033  
##    ethnicityNon-Hispanic White:childEDsecondary  
##                                         0.24679  
##        ethnicityOther Hispanic:childEDsecondary  
##                                        -0.06259  
##        ethnicityOther or Multi:childEDsecondary  
##                                         0.05825  
## 
## Degrees of Freedom: 6618 Total (i.e. Null);  106 Residual
##   (15730 observations deleted due to missingness)
## Null Deviance:       8327 
## Residual Deviance: 7430  AIC: 24770
# this gives and AIC 24,770

# checking childED*citizenship
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=subset_child, na.action = na.omit)

ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship * childED, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                        3.684059  
##                  refEDpartial college and below  
##                                        0.329206  
##                                  ageyoung adult  
##                                       -0.121009  
##                                      gendermale  
##                                       -0.181365  
##                     ethnicityNon-Hispanic Black  
##                                        0.645824  
##                     ethnicityNon-Hispanic White  
##                                       -0.188952  
##                         ethnicityOther Hispanic  
##                                        0.261644  
##                         ethnicityOther or Multi  
##                                       -0.242197  
##           fplfamily income 2x poverty threshold  
##                                       -0.064534  
##           fplfamily income 3x poverty threshold  
##                                        0.002993  
##           fplfamily income 4x poverty threshold  
##                                        0.075404  
##           fplfamily income 5x poverty threshold  
##                                        0.107554  
## fplfamily income more than 5x poverty threshold  
##                                       -0.082134  
##                     citizenshipnot U,S, citizen  
##                                        0.118607  
##                                childEDsecondary  
##                                        0.624637  
##    citizenshipnot U,S, citizen:childEDsecondary  
##                                        0.013054  
## 
## Degrees of Freedom: 6618 Total (i.e. Null);  109 Residual
##   (15730 observations deleted due to missingness)
## Null Deviance:       8327 
## Residual Deviance: 7446  AIC: 24780
# this gives and AIC 24,780

no subset: WHOLE sample

model_whole <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

summ(model_whole)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.07    51.36   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.09   0.00
## ageyoung adult                     0.44   0.05     8.15   0.00
## ethnicityNon-Hispanic              0.57   0.07     8.39   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.83   0.00
## fplfamily income 2x poverty        0.03   0.04     0.69   0.49
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.06     1.09   0.28
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.82   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.85   0.07
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.53   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.90   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(model_whole, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_whole, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.71   0.07    51.36   0.00
## refEDpartial college and           0.35   0.05     7.77   0.00
## below                                                         
## agemiddle-aged                     0.36   0.03    10.78   0.00
## ageolder adult                     0.35   0.06     6.09   0.00
## ageyoung adult                     0.44   0.05     8.15   0.00
## ethnicityNon-Hispanic              0.57   0.07     8.39   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.45   0.00
## White                                                         
## ethnicityOther Hispanic            0.15   0.08     1.92   0.06
## ethnicityOther or Multi           -0.57   0.08    -6.83   0.00
## fplfamily income 2x poverty        0.03   0.04     0.69   0.49
## threshold                                                     
## fplfamily income 3x poverty        0.06   0.06     1.09   0.28
## threshold                                                     
## fplfamily income 4x poverty        0.11   0.06     1.82   0.07
## threshold                                                     
## fplfamily income 5x poverty        0.12   0.06     1.85   0.07
## threshold                                                     
## fplfamily income more than         0.09   0.06     1.53   0.13
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.17   0.06     2.90   0.00
## citizen                                                       
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(model_whole, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.080 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.712    3.569    3.855   51.356   0.000
## refEDpartial college and           0.355    0.264    0.445    7.765   0.000
## below                                                                      
## agemiddle-aged                     0.363    0.296    0.430   10.782   0.000
## ageolder adult                     0.350    0.236    0.464    6.089   0.000
## ageyoung adult                     0.443    0.335    0.551    8.155   0.000
## ethnicityNon-Hispanic              0.570    0.435    0.704    8.388   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.287   -0.415   -0.159   -4.452   0.000
## White                                                                      
## ethnicityOther Hispanic            0.146   -0.004    0.297    1.923   0.057
## ethnicityOther or Multi           -0.566   -0.730   -0.402   -6.834   0.000
## fplfamily income 2x poverty        0.031   -0.057    0.118    0.693   0.490
## threshold                                                                  
## fplfamily income 3x poverty        0.060   -0.049    0.170    1.094   0.277
## threshold                                                                  
## fplfamily income 4x poverty        0.107   -0.009    0.224    1.822   0.071
## threshold                                                                  
## fplfamily income 5x poverty        0.118   -0.008    0.244    1.854   0.066
## threshold                                                                  
## fplfamily income more than         0.094   -0.028    0.216    1.533   0.128
## 5x poverty threshold                                                       
## citizenshipnot U,S,                0.168    0.053    0.283    2.899   0.005
## citizen                                                                    
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.372
summ(model_whole, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.71    3.57    3.86    51.36
## refEDpartial college and           0.35    0.26    0.45     7.77
## below                                                           
## agemiddle-aged                     0.36    0.30    0.43    10.78
## ageolder adult                     0.35    0.24    0.46     6.09
## ageyoung adult                     0.44    0.34    0.55     8.15
## ethnicityNon-Hispanic              0.57    0.44    0.70     8.39
## Black                                                           
## ethnicityNon-Hispanic             -0.29   -0.42   -0.16    -4.45
## White                                                           
## ethnicityOther Hispanic            0.15   -0.00    0.30     1.92
## ethnicityOther or Multi           -0.57   -0.73   -0.40    -6.83
## fplfamily income 2x poverty        0.03   -0.06    0.12     0.69
## threshold                                                       
## fplfamily income 3x poverty        0.06   -0.05    0.17     1.09
## threshold                                                       
## fplfamily income 4x poverty        0.11   -0.01    0.22     1.82
## threshold                                                       
## fplfamily income 5x poverty        0.12   -0.01    0.24     1.85
## threshold                                                       
## fplfamily income more than         0.09   -0.03    0.22     1.53
## 5x poverty threshold                                            
## citizenshipnot U,S,                0.17    0.05    0.28     2.90
## citizen                                                         
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(model_whole)

plot_summs(model_whole, inner_ci_level = .9)

plot_summs(model_whole, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_whole, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_whole, scale = TRUE)
Model 1
(Intercept)3.71 ***
(0.07)   
refEDpartial college and below0.35 ***
(0.05)   
agemiddle-aged0.36 ***
(0.03)   
ageolder adult0.35 ***
(0.06)   
ageyoung adult0.44 ***
(0.05)   
ethnicityNon-Hispanic Black0.57 ***
(0.07)   
ethnicityNon-Hispanic White-0.29 ***
(0.06)   
ethnicityOther Hispanic0.15    
(0.08)   
ethnicityOther or Multi-0.57 ***
(0.08)   
fplfamily income 2x poverty threshold0.03    
(0.04)   
fplfamily income 3x poverty threshold0.06    
(0.06)   
fplfamily income 4x poverty threshold0.11    
(0.06)   
fplfamily income 5x poverty threshold0.12    
(0.06)   
fplfamily income more than 5x poverty threshold0.09    
(0.06)   
citizenshipnot U,S, citizen0.17 ** 
(0.06)   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_whole, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.71 ***
[3.57, 3.86]   
refEDpartial college and below0.35 ***
[0.26, 0.45]   
agemiddle-aged0.36 ***
[0.30, 0.43]   
ageolder adult0.35 ***
[0.24, 0.46]   
ageyoung adult0.44 ***
[0.34, 0.55]   
ethnicityNon-Hispanic Black0.57 ***
[0.44, 0.70]   
ethnicityNon-Hispanic White-0.29 ***
[-0.42, -0.16]   
ethnicityOther Hispanic0.15    
[-0.00, 0.30]   
ethnicityOther or Multi-0.57 ***
[-0.73, -0.40]   
fplfamily income 2x poverty threshold0.03    
[-0.06, 0.12]   
fplfamily income 3x poverty threshold0.06    
[-0.05, 0.17]   
fplfamily income 4x poverty threshold0.11    
[-0.01, 0.22]   
fplfamily income 5x poverty threshold0.12    
[-0.01, 0.24]   
fplfamily income more than 5x poverty threshold0.09    
[-0.03, 0.22]   
citizenshipnot U,S, citizen0.17 ** 
[0.05, 0.28]   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

INTERACTIONS?

whole population:

# run each of the next three "model_whole" one at a time, to check for each interaction

## interaction: refED*age
# model_whole <- svyglm(log(monoEthyl)~refED*age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

## interaction: age*fpl
# model_whole <- svyglm(log(monoEthyl)~refED+gender+ethnicity+age*fpl+citizenship, design=nhc, na.action = na.omit)

## interaction: ethnicity*fpl
# model_whole <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*fpl+citizenship, design=nhc, na.action = na.omit)

## interaction: refED*fpl
model_whole <- svyglm(log(monoEthyl)~refED*fpl+gender+ethnicity+age+citizenship, design=nhc, na.action = na.omit)

summ(model_whole)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.76   0.14    26.48   0.00
## refEDpartial college and           0.31   0.14     2.25   0.03
## below                                                         
## fplfamily income 2x poverty        0.05   0.14     0.38   0.71
## threshold                                                     
## fplfamily income 3x poverty        0.12   0.17     0.74   0.46
## threshold                                                     
## fplfamily income 4x poverty        0.23   0.15     1.58   0.12
## threshold                                                     
## fplfamily income 5x poverty       -0.06   0.15    -0.41   0.68
## threshold                                                     
## fplfamily income more than         0.01   0.14     0.06   0.96
## 5x poverty threshold                                          
## gendermale                        -0.01   0.03    -0.18   0.85
## ethnicityNon-Hispanic              0.57   0.07     8.32   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.54   0.00
## White                                                         
## ethnicityOther Hispanic            0.14   0.08     1.86   0.07
## ethnicityOther or Multi           -0.58   0.08    -6.94   0.00
## agemiddle-aged                     0.36   0.03    10.88   0.00
## ageolder adult                     0.35   0.06     6.16   0.00
## ageyoung adult                     0.44   0.05     8.06   0.00
## citizenshipnot U,S,                0.17   0.06     2.98   0.00
## citizen                                                       
## refEDpartial college and          -0.03   0.15    -0.19   0.85
## below:fplfamily income 2x                                     
## poverty threshold                                             
## refEDpartial college and          -0.09   0.18    -0.48   0.63
## below:fplfamily income 3x                                     
## poverty threshold                                             
## refEDpartial college and          -0.19   0.16    -1.19   0.24
## below:fplfamily income 4x                                     
## poverty threshold                                             
## refEDpartial college and           0.27   0.17     1.58   0.12
## below:fplfamily income 5x                                     
## poverty threshold                                             
## refEDpartial college and           0.15   0.15     1.00   0.32
## below:fplfamily income more                                   
## than 5x poverty threshold                                     
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(model_whole, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_whole, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.76   0.14    26.48   0.00
## refEDpartial college and           0.31   0.14     2.25   0.03
## below                                                         
## fplfamily income 2x poverty        0.05   0.14     0.38   0.71
## threshold                                                     
## fplfamily income 3x poverty        0.12   0.17     0.74   0.46
## threshold                                                     
## fplfamily income 4x poverty        0.23   0.15     1.58   0.12
## threshold                                                     
## fplfamily income 5x poverty       -0.06   0.15    -0.41   0.68
## threshold                                                     
## fplfamily income more than         0.01   0.14     0.06   0.96
## 5x poverty threshold                                          
## gendermale                        -0.01   0.03    -0.18   0.85
## ethnicityNon-Hispanic              0.57   0.07     8.32   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.29   0.06    -4.54   0.00
## White                                                         
## ethnicityOther Hispanic            0.14   0.08     1.86   0.07
## ethnicityOther or Multi           -0.58   0.08    -6.94   0.00
## agemiddle-aged                     0.36   0.03    10.88   0.00
## ageolder adult                     0.35   0.06     6.16   0.00
## ageyoung adult                     0.44   0.05     8.06   0.00
## citizenshipnot U,S,                0.17   0.06     2.98   0.00
## citizen                                                       
## refEDpartial college and          -0.03   0.15    -0.19   0.85
## below:fplfamily income 2x                                     
## poverty threshold                                             
## refEDpartial college and          -0.09   0.18    -0.48   0.63
## below:fplfamily income 3x                                     
## poverty threshold                                             
## refEDpartial college and          -0.19   0.16    -1.19   0.24
## below:fplfamily income 4x                                     
## poverty threshold                                             
## refEDpartial college and           0.27   0.17     1.58   0.12
## below:fplfamily income 5x                                     
## poverty threshold                                             
## refEDpartial college and           0.15   0.15     1.00   0.32
## below:fplfamily income more                                   
## than 5x poverty threshold                                     
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
summ(model_whole, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.062
## Adj. R² = -0.141 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.762    3.481    4.044   26.483   0.000
## refEDpartial college and           0.307    0.036    0.577    2.248   0.027
## below                                                                      
## fplfamily income 2x poverty        0.053   -0.226    0.332    0.378   0.707
## threshold                                                                  
## fplfamily income 3x poverty        0.124   -0.209    0.458    0.740   0.461
## threshold                                                                  
## fplfamily income 4x poverty        0.234   -0.059    0.526    1.583   0.116
## threshold                                                                  
## fplfamily income 5x poverty       -0.060   -0.348    0.228   -0.411   0.682
## threshold                                                                  
## fplfamily income more than         0.008   -0.274    0.290    0.056   0.955
## 5x poverty threshold                                                       
## gendermale                        -0.005   -0.064    0.053   -0.185   0.854
## ethnicityNon-Hispanic              0.566    0.431    0.701    8.323   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.290   -0.417   -0.164   -4.544   0.000
## White                                                                      
## ethnicityOther Hispanic            0.141   -0.009    0.292    1.860   0.066
## ethnicityOther or Multi           -0.578   -0.743   -0.413   -6.938   0.000
## agemiddle-aged                     0.363    0.297    0.429   10.877   0.000
## ageolder adult                     0.352    0.239    0.466    6.155   0.000
## ageyoung adult                     0.440    0.331    0.548    8.059   0.000
## citizenshipnot U,S,                0.172    0.058    0.286    2.981   0.004
## citizen                                                                    
## refEDpartial college and          -0.028   -0.321    0.265   -0.190   0.850
## below:fplfamily income 2x                                                  
## poverty threshold                                                          
## refEDpartial college and          -0.085   -0.438    0.267   -0.479   0.633
## below:fplfamily income 3x                                                  
## poverty threshold                                                          
## refEDpartial college and          -0.190   -0.507    0.128   -1.186   0.238
## below:fplfamily income 4x                                                  
## poverty threshold                                                          
## refEDpartial college and           0.269   -0.069    0.607    1.579   0.117
## below:fplfamily income 5x                                                  
## poverty threshold                                                          
## refEDpartial college and           0.151   -0.147    0.449    1.004   0.318
## below:fplfamily income more                                                
## than 5x poverty threshold                                                  
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.368
summ(model_whole, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.76    3.48    4.04    26.48
## refEDpartial college and           0.31    0.04    0.58     2.25
## below                                                           
## fplfamily income 2x poverty        0.05   -0.23    0.33     0.38
## threshold                                                       
## fplfamily income 3x poverty        0.12   -0.21    0.46     0.74
## threshold                                                       
## fplfamily income 4x poverty        0.23   -0.06    0.53     1.58
## threshold                                                       
## fplfamily income 5x poverty       -0.06   -0.35    0.23    -0.41
## threshold                                                       
## fplfamily income more than         0.01   -0.27    0.29     0.06
## 5x poverty threshold                                            
## gendermale                        -0.01   -0.06    0.05    -0.18
## ethnicityNon-Hispanic              0.57    0.43    0.70     8.32
## Black                                                           
## ethnicityNon-Hispanic             -0.29   -0.42   -0.16    -4.54
## White                                                           
## ethnicityOther Hispanic            0.14   -0.01    0.29     1.86
## ethnicityOther or Multi           -0.58   -0.74   -0.41    -6.94
## agemiddle-aged                     0.36    0.30    0.43    10.88
## ageolder adult                     0.35    0.24    0.47     6.16
## ageyoung adult                     0.44    0.33    0.55     8.06
## citizenshipnot U,S,                0.17    0.06    0.29     2.98
## citizen                                                         
## refEDpartial college and          -0.03   -0.32    0.27    -0.19
## below:fplfamily income 2x                                       
## poverty threshold                                               
## refEDpartial college and          -0.09   -0.44    0.27    -0.48
## below:fplfamily income 3x                                       
## poverty threshold                                               
## refEDpartial college and          -0.19   -0.51    0.13    -1.19
## below:fplfamily income 4x                                       
## poverty threshold                                               
## refEDpartial college and           0.27   -0.07    0.61     1.58
## below:fplfamily income 5x                                       
## poverty threshold                                               
## refEDpartial college and           0.15   -0.15    0.45     1.00
## below:fplfamily income more                                     
## than 5x poverty threshold                                       
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(model_whole)

plot_summs(model_whole, inner_ci_level = .9)

plot_summs(model_whole, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_whole, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_whole, scale = TRUE)
Model 1
(Intercept)3.76 ***
(0.14)   
refEDpartial college and below0.31 *  
(0.14)   
fplfamily income 2x poverty threshold0.05    
(0.14)   
fplfamily income 3x poverty threshold0.12    
(0.17)   
fplfamily income 4x poverty threshold0.23    
(0.15)   
fplfamily income 5x poverty threshold-0.06    
(0.15)   
fplfamily income more than 5x poverty threshold0.01    
(0.14)   
gendermale-0.01    
(0.03)   
ethnicityNon-Hispanic Black0.57 ***
(0.07)   
ethnicityNon-Hispanic White-0.29 ***
(0.06)   
ethnicityOther Hispanic0.14    
(0.08)   
ethnicityOther or Multi-0.58 ***
(0.08)   
agemiddle-aged0.36 ***
(0.03)   
ageolder adult0.35 ***
(0.06)   
ageyoung adult0.44 ***
(0.05)   
citizenshipnot U,S, citizen0.17 ** 
(0.06)   
refEDpartial college and below:fplfamily income 2x poverty threshold-0.03    
(0.15)   
refEDpartial college and below:fplfamily income 3x poverty threshold-0.09    
(0.18)   
refEDpartial college and below:fplfamily income 4x poverty threshold-0.19    
(0.16)   
refEDpartial college and below:fplfamily income 5x poverty threshold0.27    
(0.17)   
refEDpartial college and below:fplfamily income more than 5x poverty threshold0.15    
(0.15)   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_whole, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.76 ***
[3.48, 4.04]   
refEDpartial college and below0.31 *  
[0.04, 0.58]   
fplfamily income 2x poverty threshold0.05    
[-0.23, 0.33]   
fplfamily income 3x poverty threshold0.12    
[-0.21, 0.46]   
fplfamily income 4x poverty threshold0.23    
[-0.06, 0.53]   
fplfamily income 5x poverty threshold-0.06    
[-0.35, 0.23]   
fplfamily income more than 5x poverty threshold0.01    
[-0.27, 0.29]   
gendermale-0.01    
[-0.06, 0.05]   
ethnicityNon-Hispanic Black0.57 ***
[0.43, 0.70]   
ethnicityNon-Hispanic White-0.29 ***
[-0.42, -0.16]   
ethnicityOther Hispanic0.14    
[-0.01, 0.29]   
ethnicityOther or Multi-0.58 ***
[-0.74, -0.41]   
agemiddle-aged0.36 ***
[0.30, 0.43]   
ageolder adult0.35 ***
[0.24, 0.47]   
ageyoung adult0.44 ***
[0.33, 0.55]   
citizenshipnot U,S, citizen0.17 ** 
[0.06, 0.29]   
refEDpartial college and below:fplfamily income 2x poverty threshold-0.03    
[-0.32, 0.27]   
refEDpartial college and below:fplfamily income 3x poverty threshold-0.09    
[-0.44, 0.27]   
refEDpartial college and below:fplfamily income 4x poverty threshold-0.19    
[-0.51, 0.13]   
refEDpartial college and below:fplfamily income 5x poverty threshold0.27    
[-0.07, 0.61]   
refEDpartial college and below:fplfamily income more than 5x poverty threshold0.15    
[-0.15, 0.45]   
N19218       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

children:

# run each of the next three "model_whole" one at a time, to check for each interaction

subset_child <- subset(nhc, RIDAGEYR <= 19)

## interaction: childED*fpl
# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=subset_child, na.action = na.omit)

## interaction: childED*ethnicity
# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=subset_child, na.action = na.omit)

## interaction: childED*citizenship
model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=subset_child, na.action = na.omit)


summ(model_child)
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.68   0.10    37.95   0.00
## refEDpartial college and           0.33   0.07     4.89   0.00
## below                                                         
## ageyoung adult                    -0.12   0.10    -1.20   0.23
## gendermale                        -0.18   0.04    -4.12   0.00
## ethnicityNon-Hispanic              0.65   0.07     9.01   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.19   0.08    -2.45   0.02
## White                                                         
## ethnicityOther Hispanic            0.26   0.10     2.58   0.01
## ethnicityOther or Multi           -0.24   0.10    -2.35   0.02
## fplfamily income 2x poverty       -0.06   0.05    -1.22   0.22
## threshold                                                     
## fplfamily income 3x poverty        0.00   0.07     0.04   0.97
## threshold                                                     
## fplfamily income 4x poverty        0.08   0.08     0.93   0.35
## threshold                                                     
## fplfamily income 5x poverty        0.11   0.10     1.10   0.27
## threshold                                                     
## fplfamily income more than        -0.08   0.09    -0.91   0.36
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.12   0.11     1.06   0.29
## citizen                                                       
## childEDsecondary                   0.62   0.05    12.49   0.00
## citizenshipnot U,S,                0.01   0.19     0.07   0.94
## citizen:childEDsecondary                                      
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
summ(model_child, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_child, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03 
## 
## Standard errors: Robust
## --------------------------------------------------------------
##                                    Est.   S.E.   t val.      p
## ------------------------------- ------- ------ -------- ------
## (Intercept)                        3.68   0.10    37.95   0.00
## refEDpartial college and           0.33   0.07     4.89   0.00
## below                                                         
## ageyoung adult                    -0.12   0.10    -1.20   0.23
## gendermale                        -0.18   0.04    -4.12   0.00
## ethnicityNon-Hispanic              0.65   0.07     9.01   0.00
## Black                                                         
## ethnicityNon-Hispanic             -0.19   0.08    -2.45   0.02
## White                                                         
## ethnicityOther Hispanic            0.26   0.10     2.58   0.01
## ethnicityOther or Multi           -0.24   0.10    -2.35   0.02
## fplfamily income 2x poverty       -0.06   0.05    -1.22   0.22
## threshold                                                     
## fplfamily income 3x poverty        0.00   0.07     0.04   0.97
## threshold                                                     
## fplfamily income 4x poverty        0.08   0.08     0.93   0.35
## threshold                                                     
## fplfamily income 5x poverty        0.11   0.10     1.10   0.27
## threshold                                                     
## fplfamily income more than        -0.08   0.09    -0.91   0.36
## 5x poverty threshold                                          
## citizenshipnot U,S,                0.12   0.11     1.06   0.29
## citizen                                                       
## childEDsecondary                   0.62   0.05    12.49   0.00
## citizenshipnot U,S,                0.01   0.19     0.07   0.94
## citizen:childEDsecondary                                      
## --------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
summ(model_child, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.106
## Adj. R² = -0.033 
## 
## Standard errors: Robust
## ---------------------------------------------------------------------------
##                                     Est.     2.5%    97.5%   t val.       p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept)                        3.684    3.492    3.876   37.953   0.000
## refEDpartial college and           0.329    0.196    0.463    4.888   0.000
## below                                                                      
## ageyoung adult                    -0.121   -0.321    0.079   -1.198   0.234
## gendermale                        -0.181   -0.269   -0.094   -4.124   0.000
## ethnicityNon-Hispanic              0.646    0.504    0.788    9.012   0.000
## Black                                                                      
## ethnicityNon-Hispanic             -0.189   -0.342   -0.036   -2.452   0.016
## White                                                                      
## ethnicityOther Hispanic            0.262    0.060    0.463    2.577   0.011
## ethnicityOther or Multi           -0.242   -0.446   -0.038   -2.351   0.020
## fplfamily income 2x poverty       -0.065   -0.169    0.040   -1.222   0.224
## threshold                                                                  
## fplfamily income 3x poverty        0.003   -0.144    0.150    0.040   0.968
## threshold                                                                  
## fplfamily income 4x poverty        0.075   -0.085    0.236    0.932   0.353
## threshold                                                                  
## fplfamily income 5x poverty        0.108   -0.087    0.302    1.098   0.275
## threshold                                                                  
## fplfamily income more than        -0.082   -0.261    0.097   -0.910   0.365
## 5x poverty threshold                                                       
## citizenshipnot U,S,                0.119   -0.103    0.340    1.063   0.290
## citizen                                                                    
## childEDsecondary                   0.625    0.526    0.724   12.489   0.000
## citizenshipnot U,S,                0.013   -0.354    0.380    0.070   0.944
## citizen:childEDsecondary                                                   
## ---------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.833
summ(model_child, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03 
## 
## Standard errors: Robust
## ----------------------------------------------------------------
##                                    Est.    2.5%   97.5%   t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept)                        3.68    3.49    3.88    37.95
## refEDpartial college and           0.33    0.20    0.46     4.89
## below                                                           
## ageyoung adult                    -0.12   -0.32    0.08    -1.20
## gendermale                        -0.18   -0.27   -0.09    -4.12
## ethnicityNon-Hispanic              0.65    0.50    0.79     9.01
## Black                                                           
## ethnicityNon-Hispanic             -0.19   -0.34   -0.04    -2.45
## White                                                           
## ethnicityOther Hispanic            0.26    0.06    0.46     2.58
## ethnicityOther or Multi           -0.24   -0.45   -0.04    -2.35
## fplfamily income 2x poverty       -0.06   -0.17    0.04    -1.22
## threshold                                                       
## fplfamily income 3x poverty        0.00   -0.14    0.15     0.04
## threshold                                                       
## fplfamily income 4x poverty        0.08   -0.08    0.24     0.93
## threshold                                                       
## fplfamily income 5x poverty        0.11   -0.09    0.30     1.10
## threshold                                                       
## fplfamily income more than        -0.08   -0.26    0.10    -0.91
## 5x poverty threshold                                            
## citizenshipnot U,S,                0.12   -0.10    0.34     1.06
## citizen                                                         
## childEDsecondary                   0.62    0.53    0.72    12.49
## citizenshipnot U,S,                0.01   -0.35    0.38     0.07
## citizen:childEDsecondary                                        
## ----------------------------------------------------------------
## 
## Estimated dispersion parameter = 1.83
# THE GRAPH
plot_summs(model_child)

plot_summs(model_child, inner_ci_level = .9)

plot_summs(model_child, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_child, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_child, scale = TRUE)
Model 1
(Intercept)3.68 ***
(0.10)   
refEDpartial college and below0.33 ***
(0.07)   
ageyoung adult-0.12    
(0.10)   
gendermale-0.18 ***
(0.04)   
ethnicityNon-Hispanic Black0.65 ***
(0.07)   
ethnicityNon-Hispanic White-0.19 *  
(0.08)   
ethnicityOther Hispanic0.26 *  
(0.10)   
ethnicityOther or Multi-0.24 *  
(0.10)   
fplfamily income 2x poverty threshold-0.06    
(0.05)   
fplfamily income 3x poverty threshold0.00    
(0.07)   
fplfamily income 4x poverty threshold0.08    
(0.08)   
fplfamily income 5x poverty threshold0.11    
(0.10)   
fplfamily income more than 5x poverty threshold-0.08    
(0.09)   
citizenshipnot U,S, citizen0.12    
(0.11)   
childEDsecondary0.62 ***
(0.05)   
citizenshipnot U,S, citizen:childEDsecondary0.01    
(0.19)   
N6619       
R20.11    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_child, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)3.68 ***
[3.49, 3.88]   
refEDpartial college and below0.33 ***
[0.20, 0.46]   
ageyoung adult-0.12    
[-0.32, 0.08]   
gendermale-0.18 ***
[-0.27, -0.09]   
ethnicityNon-Hispanic Black0.65 ***
[0.50, 0.79]   
ethnicityNon-Hispanic White-0.19 *  
[-0.34, -0.04]   
ethnicityOther Hispanic0.26 *  
[0.06, 0.46]   
ethnicityOther or Multi-0.24 *  
[-0.45, -0.04]   
fplfamily income 2x poverty threshold-0.06    
[-0.17, 0.04]   
fplfamily income 3x poverty threshold0.00    
[-0.14, 0.15]   
fplfamily income 4x poverty threshold0.08    
[-0.08, 0.24]   
fplfamily income 5x poverty threshold0.11    
[-0.09, 0.30]   
fplfamily income more than 5x poverty threshold-0.08    
[-0.26, 0.10]   
citizenshipnot U,S, citizen0.12    
[-0.10, 0.34]   
childEDsecondary0.62 ***
[0.53, 0.72]   
citizenshipnot U,S, citizen:childEDsecondary0.01    
[-0.35, 0.38]   
N6619       
R20.11    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

adults:

# run each of the next three "model_whole" one at a time, to check for each interaction

subset_adult <- subset(nhc, RIDAGEYR > 19)

## interaction: refED*adultED
# model_adult <- svyglm(log(monoEthyl)~refED*adultED+age+gender+ethnicity+fpl+citizenship, design=subset_adult, na.action = na.omit)

## interaction: adultED*fpl
# model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit)

## interaction: adultED*ethnicity
# model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*adultED+fpl+citizenship, design=subset_adult, na.action = na.omit)

# interaction: adultED*citizenship
model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*adultED, design=subset_adult, na.action = na.omit)



summ(model_adult)
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17 
## 
## Standard errors: Robust
## ---------------------------------------------------------------
##                                     Est.   S.E.   t val.      p
## -------------------------------- ------- ------ -------- ------
## (Intercept)                         4.47   0.11    39.64   0.00
## refEDpartial college and            0.14   0.07     2.07   0.04
## below                                                          
## ageolder adult                     -0.02   0.05    -0.38   0.71
## ageyoung adult                      0.08   0.06     1.32   0.19
## gendermale                          0.03   0.04     0.95   0.34
## ethnicityNon-Hispanic               0.47   0.07     6.61   0.00
## Black                                                          
## ethnicityNon-Hispanic              -0.39   0.07    -5.74   0.00
## White                                                          
## ethnicityOther Hispanic             0.05   0.08     0.62   0.54
## ethnicityOther or Multi            -0.71   0.09    -7.58   0.00
## fplfamily income 2x poverty         0.07   0.05     1.34   0.18
## threshold                                                      
## fplfamily income 3x poverty         0.09   0.07     1.31   0.19
## threshold                                                      
## fplfamily income 4x poverty         0.15   0.07     2.12   0.04
## threshold                                                      
## fplfamily income 5x poverty         0.16   0.08     2.09   0.04
## threshold                                                      
## fplfamily income more than          0.18   0.07     2.55   0.01
## 5x poverty threshold                                           
## citizenshipnot U,S,                 0.02   0.11     0.15   0.88
## citizen                                                        
## adultEDcollege grad or             -0.41   0.08    -5.17   0.00
## above                                                          
## adultEDhigh school                 -0.08   0.06    -1.32   0.19
## grad/GED                                                       
## adultEDless than 9th grade         -0.19   0.09    -2.00   0.05
## adultEDsome college or AA          -0.18   0.07    -2.70   0.01
## citizenshipnot U,S,                 0.04   0.16     0.27   0.78
## citizen:adultEDcollege grad or                                 
## above                                                          
## citizenshipnot U,S,                 0.15   0.15     0.98   0.33
## citizen:adultEDhigh school                                     
## grad/GED                                                       
## citizenshipnot U,S,                 0.10   0.15     0.65   0.52
## citizen:adultEDless than 9th                                   
## grade                                                          
## citizenshipnot U,S,                 0.35   0.18     1.92   0.06
## citizen:adultEDsome college or                                 
## AA                                                             
## ---------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
summ(model_adult, robust = "HC1") #robust standard errors 
## Warning in summ.svyglm(model_adult, robust = "HC1"): Robust standard errors are reported by default
##  in the survey package.
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17 
## 
## Standard errors: Robust
## ---------------------------------------------------------------
##                                     Est.   S.E.   t val.      p
## -------------------------------- ------- ------ -------- ------
## (Intercept)                         4.47   0.11    39.64   0.00
## refEDpartial college and            0.14   0.07     2.07   0.04
## below                                                          
## ageolder adult                     -0.02   0.05    -0.38   0.71
## ageyoung adult                      0.08   0.06     1.32   0.19
## gendermale                          0.03   0.04     0.95   0.34
## ethnicityNon-Hispanic               0.47   0.07     6.61   0.00
## Black                                                          
## ethnicityNon-Hispanic              -0.39   0.07    -5.74   0.00
## White                                                          
## ethnicityOther Hispanic             0.05   0.08     0.62   0.54
## ethnicityOther or Multi            -0.71   0.09    -7.58   0.00
## fplfamily income 2x poverty         0.07   0.05     1.34   0.18
## threshold                                                      
## fplfamily income 3x poverty         0.09   0.07     1.31   0.19
## threshold                                                      
## fplfamily income 4x poverty         0.15   0.07     2.12   0.04
## threshold                                                      
## fplfamily income 5x poverty         0.16   0.08     2.09   0.04
## threshold                                                      
## fplfamily income more than          0.18   0.07     2.55   0.01
## 5x poverty threshold                                           
## citizenshipnot U,S,                 0.02   0.11     0.15   0.88
## citizen                                                        
## adultEDcollege grad or             -0.41   0.08    -5.17   0.00
## above                                                          
## adultEDhigh school                 -0.08   0.06    -1.32   0.19
## grad/GED                                                       
## adultEDless than 9th grade         -0.19   0.09    -2.00   0.05
## adultEDsome college or AA          -0.18   0.07    -2.70   0.01
## citizenshipnot U,S,                 0.04   0.16     0.27   0.78
## citizen:adultEDcollege grad or                                 
## above                                                          
## citizenshipnot U,S,                 0.15   0.15     0.98   0.33
## citizen:adultEDhigh school                                     
## grad/GED                                                       
## citizenshipnot U,S,                 0.10   0.15     0.65   0.52
## citizen:adultEDless than 9th                                   
## grade                                                          
## citizenshipnot U,S,                 0.35   0.18     1.92   0.06
## citizen:adultEDsome college or                                 
## AA                                                             
## ---------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
summ(model_adult, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.057
## Adj. R² = -0.172 
## 
## Standard errors: Robust
## ----------------------------------------------------------------------------
##                                      Est.     2.5%    97.5%   t val.       p
## -------------------------------- -------- -------- -------- -------- -------
## (Intercept)                         4.469    4.245    4.692   39.640   0.000
## refEDpartial college and            0.140    0.006    0.274    2.075   0.041
## below                                                                       
## ageolder adult                     -0.020   -0.129    0.088   -0.375   0.708
## ageyoung adult                      0.077   -0.039    0.193    1.318   0.190
## gendermale                          0.034   -0.037    0.106    0.949   0.345
## ethnicityNon-Hispanic               0.471    0.330    0.613    6.613   0.000
## Black                                                                       
## ethnicityNon-Hispanic              -0.388   -0.522   -0.254   -5.738   0.000
## White                                                                       
## ethnicityOther Hispanic             0.052   -0.115    0.219    0.617   0.538
## ethnicityOther or Multi            -0.707   -0.893   -0.522   -7.576   0.000
## fplfamily income 2x poverty         0.073   -0.035    0.180    1.345   0.182
## threshold                                                                   
## fplfamily income 3x poverty         0.088   -0.045    0.221    1.309   0.193
## threshold                                                                   
## fplfamily income 4x poverty         0.148    0.009    0.286    2.118   0.037
## threshold                                                                   
## fplfamily income 5x poverty         0.159    0.008    0.309    2.086   0.040
## threshold                                                                   
## fplfamily income more than          0.180    0.040    0.319    2.546   0.012
## 5x poverty threshold                                                        
## citizenshipnot U,S,                 0.016   -0.204    0.237    0.149   0.882
## citizen                                                                     
## adultEDcollege grad or             -0.406   -0.562   -0.250   -5.170   0.000
## above                                                                       
## adultEDhigh school                 -0.081   -0.203    0.040   -1.323   0.189
## grad/GED                                                                    
## adultEDless than 9th grade         -0.190   -0.378   -0.002   -2.002   0.048
## adultEDsome college or AA          -0.184   -0.319   -0.049   -2.697   0.008
## citizenshipnot U,S,                 0.044   -0.274    0.362    0.274   0.785
## citizen:adultEDcollege grad or                                              
## above                                                                       
## citizenshipnot U,S,                 0.149   -0.153    0.451    0.980   0.329
## citizen:adultEDhigh school                                                  
## grad/GED                                                                    
## citizenshipnot U,S,                 0.098   -0.203    0.399    0.646   0.520
## citizen:adultEDless than 9th                                                
## grade                                                                       
## citizenshipnot U,S,                 0.353   -0.011    0.716    1.925   0.057
## citizen:adultEDsome college or                                              
## AA                                                                          
## ----------------------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.487
summ(model_adult, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression 
## 
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17 
## 
## Standard errors: Robust
## -----------------------------------------------------------------
##                                     Est.    2.5%   97.5%   t val.
## -------------------------------- ------- ------- ------- --------
## (Intercept)                         4.47    4.25    4.69    39.64
## refEDpartial college and            0.14    0.01    0.27     2.07
## below                                                            
## ageolder adult                     -0.02   -0.13    0.09    -0.38
## ageyoung adult                      0.08   -0.04    0.19     1.32
## gendermale                          0.03   -0.04    0.11     0.95
## ethnicityNon-Hispanic               0.47    0.33    0.61     6.61
## Black                                                            
## ethnicityNon-Hispanic              -0.39   -0.52   -0.25    -5.74
## White                                                            
## ethnicityOther Hispanic             0.05   -0.12    0.22     0.62
## ethnicityOther or Multi            -0.71   -0.89   -0.52    -7.58
## fplfamily income 2x poverty         0.07   -0.03    0.18     1.34
## threshold                                                        
## fplfamily income 3x poverty         0.09   -0.05    0.22     1.31
## threshold                                                        
## fplfamily income 4x poverty         0.15    0.01    0.29     2.12
## threshold                                                        
## fplfamily income 5x poverty         0.16    0.01    0.31     2.09
## threshold                                                        
## fplfamily income more than          0.18    0.04    0.32     2.55
## 5x poverty threshold                                             
## citizenshipnot U,S,                 0.02   -0.20    0.24     0.15
## citizen                                                          
## adultEDcollege grad or             -0.41   -0.56   -0.25    -5.17
## above                                                            
## adultEDhigh school                 -0.08   -0.20    0.04    -1.32
## grad/GED                                                         
## adultEDless than 9th grade         -0.19   -0.38   -0.00    -2.00
## adultEDsome college or AA          -0.18   -0.32   -0.05    -2.70
## citizenshipnot U,S,                 0.04   -0.27    0.36     0.27
## citizen:adultEDcollege grad or                                   
## above                                                            
## citizenshipnot U,S,                 0.15   -0.15    0.45     0.98
## citizen:adultEDhigh school                                       
## grad/GED                                                         
## citizenshipnot U,S,                 0.10   -0.20    0.40     0.65
## citizen:adultEDless than 9th                                     
## grade                                                            
## citizenshipnot U,S,                 0.35   -0.01    0.72     1.92
## citizen:adultEDsome college or                                   
## AA                                                               
## -----------------------------------------------------------------
## 
## Estimated dispersion parameter = 2.49
# THE GRAPH
plot_summs(model_adult)

plot_summs(model_adult, inner_ci_level = .9)

plot_summs(model_adult, robust = TRUE)

# plot coefficient uncertainty as normal distributions
plot_summs(model_adult, plot.distributions = TRUE, inner_ci_level = .9)

# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_adult, scale = TRUE)
Model 1
(Intercept)4.47 ***
(0.11)   
refEDpartial college and below0.14 *  
(0.07)   
ageolder adult-0.02    
(0.05)   
ageyoung adult0.08    
(0.06)   
gendermale0.03    
(0.04)   
ethnicityNon-Hispanic Black0.47 ***
(0.07)   
ethnicityNon-Hispanic White-0.39 ***
(0.07)   
ethnicityOther Hispanic0.05    
(0.08)   
ethnicityOther or Multi-0.71 ***
(0.09)   
fplfamily income 2x poverty threshold0.07    
(0.05)   
fplfamily income 3x poverty threshold0.09    
(0.07)   
fplfamily income 4x poverty threshold0.15 *  
(0.07)   
fplfamily income 5x poverty threshold0.16 *  
(0.08)   
fplfamily income more than 5x poverty threshold0.18 *  
(0.07)   
citizenshipnot U,S, citizen0.02    
(0.11)   
adultEDcollege grad or above-0.41 ***
(0.08)   
adultEDhigh school grad/GED-0.08    
(0.06)   
adultEDless than 9th grade-0.19 *  
(0.09)   
adultEDsome college or AA-0.18 ** 
(0.07)   
citizenshipnot U,S, citizen:adultEDcollege grad or above0.04    
(0.16)   
citizenshipnot U,S, citizen:adultEDhigh school grad/GED0.15    
(0.15)   
citizenshipnot U,S, citizen:adultEDless than 9th grade0.10    
(0.15)   
citizenshipnot U,S, citizen:adultEDsome college or AA0.35    
(0.18)   
N12132       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.
# confidence intervals instead of standard errors
export_summs(model_adult, scale = TRUE,
             error_format = "[{conf.low}, {conf.high}]")
Model 1
(Intercept)4.47 ***
[4.25, 4.69]   
refEDpartial college and below0.14 *  
[0.01, 0.27]   
ageolder adult-0.02    
[-0.13, 0.09]   
ageyoung adult0.08    
[-0.04, 0.19]   
gendermale0.03    
[-0.04, 0.11]   
ethnicityNon-Hispanic Black0.47 ***
[0.33, 0.61]   
ethnicityNon-Hispanic White-0.39 ***
[-0.52, -0.25]   
ethnicityOther Hispanic0.05    
[-0.12, 0.22]   
ethnicityOther or Multi-0.71 ***
[-0.89, -0.52]   
fplfamily income 2x poverty threshold0.07    
[-0.03, 0.18]   
fplfamily income 3x poverty threshold0.09    
[-0.05, 0.22]   
fplfamily income 4x poverty threshold0.15 *  
[0.01, 0.29]   
fplfamily income 5x poverty threshold0.16 *  
[0.01, 0.31]   
fplfamily income more than 5x poverty threshold0.18 *  
[0.04, 0.32]   
citizenshipnot U,S, citizen0.02    
[-0.20, 0.24]   
adultEDcollege grad or above-0.41 ***
[-0.56, -0.25]   
adultEDhigh school grad/GED-0.08    
[-0.20, 0.04]   
adultEDless than 9th grade-0.19 *  
[-0.38, -0.00]   
adultEDsome college or AA-0.18 ** 
[-0.32, -0.05]   
citizenshipnot U,S, citizen:adultEDcollege grad or above0.04    
[-0.27, 0.36]   
citizenshipnot U,S, citizen:adultEDhigh school grad/GED0.15    
[-0.15, 0.45]   
citizenshipnot U,S, citizen:adultEDless than 9th grade0.10    
[-0.20, 0.40]   
citizenshipnot U,S, citizen:adultEDsome college or AA0.35    
[-0.01, 0.72]   
N12132       
R20.06    
All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05.

Bayesian Information Criterion (BIC)

The model that performs best is Model B, a simple linear regression showing the logged mono-ethyl phthalate as a function of the reference person’s education level, the participant’s age, the participant’s ethnicity, the participant’s family income to poverty ratio, and the participant’s citizenship status.

XXX The delta BIC for Model C is 72.136, which is far beyond the BIC <7 threshold, so it is a very unlikely model, and therefore, we will dismiss Model C. This reinforces the idea that citizenship status matters substantially to the question of determining the logged phthalate level in individuals.

XXX Model A and Model B are quite similar - the only difference is that Model B does not include gender. Model A, which is the model performing second best, has a delta BIC = 6.590 (and still under the BIC < 7 threshold).

ANOVA

An ANOVA test is a type of statistical test used to determine if there is a statistically significant difference between two or more categorical groups by testing for differences of means using variance.

Wald test

A parametric statistical measure to confirm whether a set of independent variables are collectively ‘significant’ for a model or not

modela<-svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

modelb<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)

modelc<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)

modeld<-svyglm(log(monoEthyl)~age+ethnicity+fpl, design=nhc, na.action = na.omit)

modele<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship+childED, design=nhc, na.action = na.omit)

########## BIC
BIC(modela, modelb, maximal=modela)
##       p      BIC    neff
## [1,] 16 45748.15     NaN
## [2,] 15 45739.74 4604.58
BIC(modelb, modelc, maximal=modelb)
##       p      BIC     neff
## [1,] 15 45738.41      NaN
## [2,] 14 45738.26 5288.861
BIC(modela, modelc, maximal=modela)
##       p      BIC     neff
## [1,] 16 45748.15      NaN
## [2,] 14 45739.74 4957.434
BIC(modela, modelb, modelc, maximal=modela)
##       p      BIC     neff
## [1,] 16 45748.15      NaN
## [2,] 15 45739.74 4604.580
## [3,] 14 45739.74 4957.434
### ???? see questions in "meetings with susie" google doc

BIC_list <- c(BIC(modela, maximal=modela), BIC(modelb, maximal=modela), BIC(modelc, maximal=modela))

model_output <- rbind(data.frame(glance(modela)), data.frame(glance(modelb)), data.frame(glance(modelc))) %>% select(BIC)

model_output <- mutate(model_output, delta.BIC = BIC-min(BIC_list))
model_output$model <- c("Model A", "Model B", "Model C")
model_output <- model_output[,c("model", "BIC", "delta.BIC")]

kable(model_output, format = "markdown", digits = 3, caption = "BIC, and Delta.BIC for the models. Delta BIC > 7 indicates models that should be dismissed from further consideration.")
BIC, and Delta.BIC for the models. Delta BIC > 7 indicates models that should be dismissed from further consideration.
model BIC delta.BIC
Model A 45748.15 NaN
Model B 45738.41 NaN
Model C 45803.82 NaN
########## ANOVA
########## Wald test
anova(modela)
## Anova table:  (Rao-Scott LRT)
## svyglm(formula = log(monoEthyl) ~ refED, design = nhc, na.action = na.omit)
##                 stats       DEff         df ddf         p    
## refED       2668.6416    8.56660    1.00000 123 < 2.2e-16 ***
## age          480.9227    5.09450    3.00000 120 4.043e-13 ***
## gender         0.5746    4.22020    1.00000 119    0.7067    
## ethnicity   1796.7690    5.18580    4.00000 115 < 2.2e-16 ***
## fpl         3102.9314    4.66830    5.00000 110 < 2.2e-16 ***
## citizenship   75.3416    3.61700    1.00000 109 1.440e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modelc, modela)
## Working (Rao-Scott+F) LRT for gender citizenship
##  in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship, design = nhc, na.action = na.omit)
## Working 2logLR =  19.30158 p= 0.00016308 
## (scale factors:  1.1 0.89 );  denominator df= 109
anova(modela, modelb, method = "Wald")
## Wald test for gender
##  in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship, design = nhc, na.action = na.omit)
## F =  0.03018945  on  1  and  109  df: p= 0.86238
# wald test for gender, p =-.86238
anova(modelb, modelc, method = "Wald")
## Wald test for citizenship
##  in svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl + 
##     citizenship, design = nhc, na.action = na.omit)
## F =  8.405465  on  1  and  110  df: p= 0.0045171
# wald test for citizenship, p =0.0045171
anova(modela, modelc, method = "Wald")
## Wald test for gender citizenship
##  in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship, design = nhc, na.action = na.omit)
## F =  4.298927  on  2  and  109  df: p= 0.015958
# wald test for gender citizenship, p = 0.015958
anova(modela, modeld, method = "Wald")
## Wald test for refED gender citizenship
##  in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship, design = nhc, na.action = na.omit)
## F =  22.43636  on  3  and  109  df: p= 2.1802e-11
# wald test for refeD gender citizenship, p = 2.1802e-11

anova(modele)
## Anova table:  (Rao-Scott LRT)
## svyglm(formula = log(monoEthyl) ~ refED, design = nhc, na.action = na.omit)
##                 stats       DEff         df ddf         p    
## refED        2668.642     8.5666     1.0000 123 < 2.2e-16 ***
## age           480.923     5.0945     3.0000 120 4.043e-13 ***
## ethnicity    1797.344     5.1837     4.0000 116 < 2.2e-16 ***
## fpl          3102.892     4.6784     5.0000 111 < 2.2e-16 ***
## citizenship    75.255     3.6369     1.0000 110 1.517e-05 ***
## childED     38111.077     1.8139     1.0000 111 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# anova(modela, modele, method = "Wald")
## cannot do a wald test due to models having different number of observations... does this mean that childED is only using 34% of the data, because 66% of it is missing
vis_miss(fullNHANES_recat, sort_miss = TRUE)

Boxplots, bivariate analyses

# change the noNAs dataset with each boxplot I create:
## one for: refED age gender ethnicity fpl citizenship

noNAs = fullNHANES_recat %>% filter(!is.na(citizenship)) %>% filter(!is.na(monoEthyl))
                                                                                                            

box_citizenship <- ggplot(data = noNAs, design=nhc,
                      aes(x=log(monoEthyl), y=citizenship, fill=citizenship)) +
  scale_fill_brewer(palette="PuBuGn") +
  geom_boxplot() +
  theme(text = element_text(size=12)) +
  xlab("(logged) Mono-Ethyl Phthalate Level (ng/mL)") +
  ylab("Participant Citizenship Status") +
ggtitle("Participant Citizenship Status and Logged Phthalate Level")


box_citizenship

Ordinary Least Squares (OLS)

website: https://www.statology.org/ols-regression-in-r/ Ordinary least squares (OLS) regression is a method that allows us to find a line that best describes the relationship between one or more predictor variables and a response variable

Akaike Information Criterion (AIC)

AIC is an estimator of prediction error and thereby relative quality of statistical models for a given set of data.

ols1 <- (svyglm(log(monoEthyl)~1, design=nhc, na.action = na.omit))
ols1
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ 1, design = nhc, na.action = na.omit)
## 
## Coefficients:
## (Intercept)  
##       4.182  
## 
## Degrees of Freedom: 21436 Total (i.e. Null);  124 Residual
##   (912 observations deleted due to missingness)
## Null Deviance:       53720 
## Residual Deviance: 53720     AIC: 88140
# this gives an AIC of 88140
## ^^ this is just practice... from the article I read online
### what does ~1 mean?

# modela <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
ols_a <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_a
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                        3.714420  
##                  refEDpartial college and below  
##                                        0.354819  
##                                  agemiddle-aged  
##                                        0.363008  
##                                  ageolder adult  
##                                        0.349740  
##                                  ageyoung adult  
##                                        0.443082  
##                                      gendermale  
##                                       -0.005086  
##                     ethnicityNon-Hispanic Black  
##                                        0.569355  
##                     ethnicityNon-Hispanic White  
##                                       -0.287305  
##                         ethnicityOther Hispanic  
##                                        0.146058  
##                         ethnicityOther or Multi  
##                                       -0.565951  
##           fplfamily income 2x poverty threshold  
##                                        0.030810  
##           fplfamily income 3x poverty threshold  
##                                        0.060688  
##           fplfamily income 4x poverty threshold  
##                                        0.107575  
##           fplfamily income 5x poverty threshold  
##                                        0.118099  
## fplfamily income more than 5x poverty threshold  
##                                        0.094568  
##                     citizenshipnot U,S, citizen  
##                                        0.168622  
## 
## Degrees of Freedom: 19217 Total (i.e. Null);  109 Residual
##   (3131 observations deleted due to missingness)
## Null Deviance:       48520 
## Residual Deviance: 45590     AIC: 77730
# this gives an AIC of 77,730

# modelb <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
ols_b <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_b
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl + 
##     citizenship, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.71194  
##                  refEDpartial college and below  
##                                         0.35481  
##                                  agemiddle-aged  
##                                         0.36316  
##                                  ageolder adult  
##                                         0.35010  
##                                  ageyoung adult  
##                                         0.44318  
##                     ethnicityNon-Hispanic Black  
##                                         0.56960  
##                     ethnicityNon-Hispanic White  
##                                        -0.28719  
##                         ethnicityOther Hispanic  
##                                         0.14634  
##                         ethnicityOther or Multi  
##                                        -0.56566  
##           fplfamily income 2x poverty threshold  
##                                         0.03056  
##           fplfamily income 3x poverty threshold  
##                                         0.06039  
##           fplfamily income 4x poverty threshold  
##                                         0.10723  
##           fplfamily income 5x poverty threshold  
##                                         0.11772  
## fplfamily income more than 5x poverty threshold  
##                                         0.09414  
##                     citizenshipnot U,S, citizen  
##                                         0.16831  
## 
## Degrees of Freedom: 19217 Total (i.e. Null);  110 Residual
##   (3131 observations deleted due to missingness)
## Null Deviance:       48520 
## Residual Deviance: 45590     AIC: 77730
# this gives an AIC of 77,730
# this means that taking gender out does not improve or decrease the prediction of monoEthyl?

# modelc <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)
ols_c <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit))
ols_c
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.76310  
##                  refEDpartial college and below  
##                                         0.35058  
##                                  agemiddle-aged  
##                                         0.37458  
##                                  ageolder adult  
##                                         0.35633  
##                                  ageyoung adult  
##                                         0.45522  
##                     ethnicityNon-Hispanic Black  
##                                         0.52409  
##                     ethnicityNon-Hispanic White  
##                                        -0.33458  
##                         ethnicityOther Hispanic  
##                                         0.13437  
##                         ethnicityOther or Multi  
##                                        -0.58169  
##           fplfamily income 2x poverty threshold  
##                                         0.02752  
##           fplfamily income 3x poverty threshold  
##                                         0.05597  
##           fplfamily income 4x poverty threshold  
##                                         0.09813  
##           fplfamily income 5x poverty threshold  
##                                         0.10755  
## fplfamily income more than 5x poverty threshold  
##                                         0.08395  
## 
## Degrees of Freedom: 19234 Total (i.e. Null);  111 Residual
##   (3114 observations deleted due to missingness)
## Null Deviance:       48550 
## Residual Deviance: 45670     AIC: 77810
# this gives an AIC of 77,810
# this means that taking out citizenship decreases our ability to predict phthalate level?

#(try)
# take out ethnicity
ols_d <- (svyglm(log(monoEthyl)~refED+age+fpl+citizenship, design=nhc, na.action = na.omit))
ols_d
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + fpl + citizenship, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.65224  
##                  refEDpartial college and below  
##                                         0.40997  
##                                  agemiddle-aged  
##                                         0.34705  
##                                  ageolder adult  
##                                         0.27423  
##                                  ageyoung adult  
##                                         0.42196  
##           fplfamily income 2x poverty threshold  
##                                        -0.03668  
##           fplfamily income 3x poverty threshold  
##                                        -0.05921  
##           fplfamily income 4x poverty threshold  
##                                        -0.03702  
##           fplfamily income 5x poverty threshold  
##                                        -0.04981  
## fplfamily income more than 5x poverty threshold  
##                                        -0.08725  
##                     citizenshipnot U,S, citizen  
##                                         0.18685  
## 
## Degrees of Freedom: 19217 Total (i.e. Null);  114 Residual
##   (3131 observations deleted due to missingness)
## Null Deviance:       48520 
## Residual Deviance: 47290     AIC: 78430
# this gives an AIC of 78,430

# take out age
ols_e <- (svyglm(log(monoEthyl)~refED+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_e
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + ethnicity + fpl + citizenship, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.91599  
##                  refEDpartial college and below  
##                                         0.36530  
##                     ethnicityNon-Hispanic Black  
##                                         0.61722  
##                     ethnicityNon-Hispanic White  
##                                        -0.22422  
##                         ethnicityOther Hispanic  
##                                         0.17728  
##                         ethnicityOther or Multi  
##                                        -0.53028  
##           fplfamily income 2x poverty threshold  
##                                         0.04858  
##           fplfamily income 3x poverty threshold  
##                                         0.07574  
##           fplfamily income 4x poverty threshold  
##                                         0.13273  
##           fplfamily income 5x poverty threshold  
##                                         0.15205  
## fplfamily income more than 5x poverty threshold  
##                                         0.14101  
##                     citizenshipnot U,S, citizen  
##                                         0.24904  
## 
## Degrees of Freedom: 19217 Total (i.e. Null);  113 Residual
##   (3131 observations deleted due to missingness)
## Null Deviance:       48520 
## Residual Deviance: 46040     AIC: 77910
# this gives an AIC of 77,910

# take out refED
ols_f <- (svyglm(log(monoEthyl)~age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_f
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ age + ethnicity + fpl + citizenship, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         4.06866  
##                                  agemiddle-aged  
##                                         0.36902  
##                                  ageolder adult  
##                                         0.36568  
##                                  ageyoung adult  
##                                         0.44037  
##                     ethnicityNon-Hispanic Black  
##                                         0.53583  
##                     ethnicityNon-Hispanic White  
##                                        -0.34139  
##                         ethnicityOther Hispanic  
##                                         0.13326  
##                         ethnicityOther or Multi  
##                                        -0.67698  
##           fplfamily income 2x poverty threshold  
##                                         0.01900  
##           fplfamily income 3x poverty threshold  
##                                         0.02816  
##           fplfamily income 4x poverty threshold  
##                                         0.05634  
##           fplfamily income 5x poverty threshold  
##                                         0.03176  
## fplfamily income more than 5x poverty threshold  
##                                        -0.06905  
##                     citizenshipnot U,S, citizen  
##                                         0.14984  
## 
## Degrees of Freedom: 19782 Total (i.e. Null);  111 Residual
##   (2566 observations deleted due to missingness)
## Null Deviance:       50110 
## Residual Deviance: 47490     AIC: 80310
# this gives an AIC of 80,310

# take out fpl
ols_g <- (svyglm(log(monoEthyl)~refED+age+ethnicity+citizenship, design=nhc, na.action = na.omit))
ols_g
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + citizenship, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                    (Intercept)  refEDpartial college and below  
##                         3.7671                          0.3202  
##                 agemiddle-aged                  ageolder adult  
##                         0.3936                          0.3435  
##                 ageyoung adult     ethnicityNon-Hispanic Black  
##                         0.4436                          0.5683  
##    ethnicityNon-Hispanic White         ethnicityOther Hispanic  
##                        -0.2672                          0.1505  
##        ethnicityOther or Multi     citizenshipnot U,S, citizen  
##                        -0.5109                          0.1526  
## 
## Degrees of Freedom: 20651 Total (i.e. Null);  115 Residual
##   (1697 observations deleted due to missingness)
## Null Deviance:       51730 
## Residual Deviance: 48680     AIC: 83550
# this gives an AIC of 83,550

# take out citizenship
ols_h <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit))
ols_h
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl, 
##     design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.76310  
##                  refEDpartial college and below  
##                                         0.35058  
##                                  agemiddle-aged  
##                                         0.37458  
##                                  ageolder adult  
##                                         0.35633  
##                                  ageyoung adult  
##                                         0.45522  
##                     ethnicityNon-Hispanic Black  
##                                         0.52409  
##                     ethnicityNon-Hispanic White  
##                                        -0.33458  
##                         ethnicityOther Hispanic  
##                                         0.13437  
##                         ethnicityOther or Multi  
##                                        -0.58169  
##           fplfamily income 2x poverty threshold  
##                                         0.02752  
##           fplfamily income 3x poverty threshold  
##                                         0.05597  
##           fplfamily income 4x poverty threshold  
##                                         0.09813  
##           fplfamily income 5x poverty threshold  
##                                         0.10755  
## fplfamily income more than 5x poverty threshold  
##                                         0.08395  
## 
## Degrees of Freedom: 19234 Total (i.e. Null);  111 Residual
##   (3114 observations deleted due to missingness)
## Null Deviance:       48550 
## Residual Deviance: 45670     AIC: 77810
# this gives an AIC of 77,810

# add in childED and gender
ols_i <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=nhc, na.action = na.omit))
ols_i
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity + 
##     fpl + citizenship + childED, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         3.68377  
##                  refEDpartial college and below  
##                                         0.32922  
##                                  ageyoung adult  
##                                        -0.12090  
##                                      gendermale  
##                                        -0.18140  
##                     ethnicityNon-Hispanic Black  
##                                         0.64594  
##                     ethnicityNon-Hispanic White  
##                                        -0.18883  
##                         ethnicityOther Hispanic  
##                                         0.26191  
##                         ethnicityOther or Multi  
##                                        -0.24214  
##           fplfamily income 2x poverty threshold  
##                                        -0.06454  
##           fplfamily income 3x poverty threshold  
##                                         0.00301  
##           fplfamily income 4x poverty threshold  
##                                         0.07541  
##           fplfamily income 5x poverty threshold  
##                                         0.10752  
## fplfamily income more than 5x poverty threshold  
##                                        -0.08216  
##                     citizenshipnot U,S, citizen  
##                                         0.12371  
##                                childEDsecondary  
##                                         0.62523  
## 
## Degrees of Freedom: 6618 Total (i.e. Null);  110 Residual
##   (15730 observations deleted due to missingness)
## Null Deviance:       8327 
## Residual Deviance: 7446  AIC: 24780
# this gives an AIC of 24,780
# (without gender, AIC of 24,800)

# add in adultED
ols_j <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship+adultED, design=nhc, na.action = na.omit))
ols_j
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Call:  svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl + 
##     citizenship + adultED, design = nhc, na.action = na.omit)
## 
## Coefficients:
##                                     (Intercept)  
##                                         4.46006  
##                  refEDpartial college and below  
##                                         0.14025  
##                                  ageolder adult  
##                                        -0.02136  
##                                  ageyoung adult  
##                                         0.07768  
##                     ethnicityNon-Hispanic Black  
##                                         0.47990  
##                     ethnicityNon-Hispanic White  
##                                        -0.37930  
##                         ethnicityOther Hispanic  
##                                         0.05900  
##                         ethnicityOther or Multi  
##                                        -0.70407  
##           fplfamily income 2x poverty threshold  
##                                         0.07539  
##           fplfamily income 3x poverty threshold  
##                                         0.09065  
##           fplfamily income 4x poverty threshold  
##                                         0.15013  
##           fplfamily income 5x poverty threshold  
##                                         0.16150  
## fplfamily income more than 5x poverty threshold  
##                                         0.18239  
##                     citizenshipnot U,S, citizen  
##                                         0.15135  
##                    adultEDcollege grad or above  
##                                        -0.39638  
##                     adultEDhigh school grad/GED  
##                                        -0.06412  
##                      adultEDless than 9th grade  
##                                        -0.18396  
##                       adultEDsome college or AA  
##                                        -0.15959  
## 
## Degrees of Freedom: 12131 Total (i.e. Null);  107 Residual
##   (10217 observations deleted due to missingness)
## Null Deviance:       39230 
## Residual Deviance: 37010     AIC: 49030
# this gives an AIC of 49,030
# with gender, AIC is the same
predmarg<-svypredmeans(ols1, ~interaction(gender,ethnicity))
predmarg
##                             mean     SE
## male.Non-Hispanic Black   4.8278 0.0490
## male.Other or Multi       3.7870 0.0771
## female.Non-Hispanic White 4.0267 0.0377
## female.Mexican American   4.3944 0.0579
## male.Mexican American     4.3826 0.0536
## male.Non-Hispanic White   4.0404 0.0328
## female.Non-Hispanic Black 4.9654 0.0433
## female.Other Hispanic     4.5672 0.0841
## female.Other or Multi     3.7306 0.0690
## male.Other Hispanic       4.4880 0.0758

Non-parametric tests

Non-parametric tests can also be done. Let’s start with a Wilcoxon signed rank test, which is the non-parametric analog of an independent-samples t-test.

wil <- svyranktest(log(monoEthyl)~age, design = nhc, na = TRUE, test = c("wilcoxon"))
wil
## 
##  Design-based KruskalWallis test
## 
## data:  log(monoEthyl) ~ age
## df = 3, Chisq = 155, p-value < 2.2e-16

This is an example of a median test.

mtest <- svyranktest(log(monoEthyl)~age, design = nhc, na = TRUE, test=("median"))
mtest
## 
##  Design-based median test
## 
## data:  log(monoEthyl) ~ age
## df = 3, Chisq = 131.71, p-value < 2.2e-16

This is an example of a Kruskal Wallis test, which is the non-parametric analog of a one-way ANOVA.

kwtest <- svyranktest(log(monoEthyl)~refED, design = nhc, na = TRUE, test=("KruskalWallis"))
kwtest
## 
##  Design-based KruskalWallis test
## 
## data:  log(monoEthyl) ~ refED
## t = 9.0082, df = 123, p-value = 3.303e-15
## alternative hypothesis: true difference in mean rank score is not equal to 0
## sample estimates:
## difference in mean rank score 
##                    0.07585648

Logistic regression

Let’s see a few examples of logistic regression. “as.factor” is key to getting this code running

logit1 <- (svyglm(as.factor(log(monoEthyl))~as.factor(refED)+RIDAGEYR, family=quasibinomial, design=nhc, na.action = na.omit))
summary(logit1)
## 
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) + 
##     RIDAGEYR, design = nhc, family = quasibinomial, na.action = na.omit)
## 
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA, 
##     nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## 
## Coefficients:
##                                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                24.75845    0.81423  30.407   <2e-16
## as.factor(refED)partial college and below -16.42189    0.48371 -33.950   <2e-16
## RIDAGEYR                                   -0.02203    0.01672  -1.317     0.19
##                                              
## (Intercept)                               ***
## as.factor(refED)partial college and below ***
## RIDAGEYR                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.7029368)
## 
## Number of Fisher Scoring iterations: 22

Logistic regression on a subpopulation (respondents age ____ )

subset1 <- subset(nhc, RIDAGEYR > 19)
logit2 <- (svyglm(as.factor(log(monoEthyl))~as.factor(refED)+RIDAGEYR+as.factor(ethnicity)+as.factor(fpl)+as.factor(citizenship), family=quasibinomial, design=subset1, na.action = na.omit))
summary(logit2)
## 
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) + 
##     RIDAGEYR + as.factor(ethnicity) + as.factor(fpl) + as.factor(citizenship), 
##     design = subset1, family = quasibinomial, na.action = na.omit)
## 
## Survey design:
## subset(nhc, RIDAGEYR > 19)
## 
## Coefficients:
##                                                             Estimate Std. Error
## (Intercept)                                                 59.98801    1.27065
## as.factor(refED)partial college and below                  -19.14015    0.65048
## RIDAGEYR                                                     0.01573    0.02980
## as.factor(ethnicity)Non-Hispanic Black                       0.46688    0.13690
## as.factor(ethnicity)Non-Hispanic White                     -17.16374    0.58293
## as.factor(ethnicity)Other Hispanic                           0.13812    0.18767
## as.factor(ethnicity)Other or Multi                           0.23037    0.23507
## as.factor(fpl)family income 2x poverty threshold             0.24851    0.17770
## as.factor(fpl)family income 3x poverty threshold           -18.53777    0.73302
## as.factor(fpl)family income 4x poverty threshold             0.50858    0.21199
## as.factor(fpl)family income 5x poverty threshold             0.41368    0.19092
## as.factor(fpl)family income more than 5x poverty threshold -18.77863    0.87613
## as.factor(citizenship)not U,S, citizen                      15.53777    0.61753
##                                                            t value Pr(>|t|)    
## (Intercept)                                                 47.211  < 2e-16 ***
## as.factor(refED)partial college and below                  -29.424  < 2e-16 ***
## RIDAGEYR                                                     0.528 0.598645    
## as.factor(ethnicity)Non-Hispanic Black                       3.410 0.000903 ***
## as.factor(ethnicity)Non-Hispanic White                     -29.444  < 2e-16 ***
## as.factor(ethnicity)Other Hispanic                           0.736 0.463291    
## as.factor(ethnicity)Other or Multi                           0.980 0.329191    
## as.factor(fpl)family income 2x poverty threshold             1.398 0.164730    
## as.factor(fpl)family income 3x poverty threshold           -25.290  < 2e-16 ***
## as.factor(fpl)family income 4x poverty threshold             2.399 0.018087 *  
## as.factor(fpl)family income 5x poverty threshold             2.167 0.032376 *  
## as.factor(fpl)family income more than 5x poverty threshold -21.434  < 2e-16 ***
## as.factor(citizenship)not U,S, citizen                      25.161  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.1814485)
## 
## Number of Fisher Scoring iterations: 24
logit3 <- (svyglm(as.factor(log(monoEthyl))~as.factor(adultED)+RIDAGEYR+as.factor(fpl), family=quasibinomial, design=subset1, na.action = na.omit))
summary(logit3)
## 
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(adultED) + 
##     RIDAGEYR + as.factor(fpl), design = subset1, family = quasibinomial, 
##     na.action = na.omit)
## 
## Survey design:
## subset(nhc, RIDAGEYR > 19)
## 
## Coefficients:
##                                                              Estimate
## (Intercept)                                                 41.959437
## as.factor(adultED)college grad or above                      1.690212
## as.factor(adultED)high school grad/GED                     -17.539382
## as.factor(adultED)less than 9th grade                       -0.737155
## as.factor(adultED)some college or AA                       -17.604884
## RIDAGEYR                                                     0.008761
## as.factor(fpl)family income 2x poverty threshold             0.171777
## as.factor(fpl)family income 3x poverty threshold           -18.674215
## as.factor(fpl)family income 4x poverty threshold             0.370404
## as.factor(fpl)family income 5x poverty threshold             0.259681
## as.factor(fpl)family income more than 5x poverty threshold -18.969089
##                                                            Std. Error t value
## (Intercept)                                                  1.318773  31.817
## as.factor(adultED)college grad or above                      0.582902   2.900
## as.factor(adultED)high school grad/GED                       0.764768 -22.934
## as.factor(adultED)less than 9th grade                        0.509135  -1.448
## as.factor(adultED)some college or AA                         0.842382 -20.899
## RIDAGEYR                                                     0.030038   0.292
## as.factor(fpl)family income 2x poverty threshold             0.157567   1.090
## as.factor(fpl)family income 3x poverty threshold             0.697581 -26.770
## as.factor(fpl)family income 4x poverty threshold             0.303998   1.218
## as.factor(fpl)family income 5x poverty threshold             0.319749   0.812
## as.factor(fpl)family income more than 5x poverty threshold   0.997105 -19.024
##                                                            Pr(>|t|)    
## (Intercept)                                                 < 2e-16 ***
## as.factor(adultED)college grad or above                     0.00448 ** 
## as.factor(adultED)high school grad/GED                      < 2e-16 ***
## as.factor(adultED)less than 9th grade                       0.15040    
## as.factor(adultED)some college or AA                        < 2e-16 ***
## RIDAGEYR                                                    0.77106    
## as.factor(fpl)family income 2x poverty threshold            0.27793    
## as.factor(fpl)family income 3x poverty threshold            < 2e-16 ***
## as.factor(fpl)family income 4x poverty threshold            0.22557    
## as.factor(fpl)family income 5x poverty threshold            0.41840    
## as.factor(fpl)family income more than 5x poverty threshold  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.2117443)
## 
## Number of Fisher Scoring iterations: 24

We can also get a Wald test for a variable in the model.

regTermTest(logit2, ~RIDAGEYR)
## Wald test for RIDAGEYR
##  in svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) + 
##     RIDAGEYR + as.factor(ethnicity) + as.factor(fpl) + as.factor(citizenship), 
##     design = subset1, family = quasibinomial, na.action = na.omit)
## F =  0.2786305  on  1  and  112  df: p= 0.59864

Instead of getting an R-squared value as you do in linear regression, a pseudo-R-squared is given in logistic regression. There are many different versions of pseudo-R-squared, and two of them are available with the psrsq function.

psrsq(logit2, method = c("Cox-Snell"))
## [1] 0.001953773
psrsq(logit2, method = c("Nagelkerke"))
## [1] 0.2118337

Ordered logistic regression

Below is an example of an ordered logistic regression. Note that the outcome variable must be a factor.

ologit1 <- svyolr(as.factor(ethnicity)~as.factor(gender)+as.factor(citizenship)+RIDAGEYR, design = nhc, method = c("logistic"))
summary(ologit1)
## Call:
## svyolr(as.factor(ethnicity) ~ as.factor(gender) + as.factor(citizenship) + 
##     RIDAGEYR, design = nhc, method = c("logistic"))
## 
## Coefficients:
##                                               Value   Std. Error   t value
## as.factor(gender)male                  -0.028130504 0.0254625109 -1.104781
## as.factor(citizenship)not U,S, citizen -0.511256670 0.2262551060 -2.259647
## RIDAGEYR                                0.006759457 0.0007330932  9.220461
## 
## Intercepts:
##                                       Value    Std. Error t value 
## Mexican American|Non-Hispanic Black    -2.0617   0.0842   -24.4983
## Non-Hispanic Black|Non-Hispanic White  -1.0566   0.0735   -14.3677
## Non-Hispanic White|Other Hispanic       2.1920   0.0695    31.5463
## Other Hispanic|Other or Multi           2.8355   0.0673    42.1320
## (42 observations deleted due to missingness)

Other types of analyses available in the survey package

There are many more types of analyses that are available in the survey package and in other packages that work with complex survey data. A few examples:

Principle components analysis (PCA).

pc <- svyprcomp(~monoEthyl+gender+refED, design=nhc,scale=TRUE,scores=TRUE)
pc
## Standard deviations (1, .., p=4):
## [1] 1.3128311 1.0579079 0.9720452 0.4609050
## 
## Rotation (n x k) = (4 x 4):
##                                         PC1       PC2         PC3         PC4
## monoEthyl                      -0.004235365 0.5636092 -0.82582215  0.01856026
## genderfemale                   -0.711199991 0.1409689  0.08449453 -0.68350788
## gendermale                      0.701706650 0.1938654  0.11351522 -0.67612002
## refEDpartial college and below -0.042242310 0.7904990  0.54588712  0.27447078

Cronbach’s alpha.

svycralpha(~log(monoEthyl)+RIDAGEYR, design=nhc, na.rm = TRUE)
##    *alpha* 
## 0.01325312